llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-27 14:02:50 +01:00

Author	SHA1	Message	Date
Eric Christopher	724dc848e4	Use the cached subtarget in PPCFrameLowering. llvm-svn: 227548	2015-01-30 02:11:26 +00:00
Eric Christopher	ff32521d30	Migrate some of PPC away from the use of bare getSubtarget/getSubtargetImpl. llvm-svn: 227547	2015-01-30 02:11:24 +00:00
Eric Christopher	4cbf90841c	Migrage PPCRegisterInfo to use the cached subtarget. llvm-svn: 227546	2015-01-30 02:11:21 +00:00
Rafael Espindola	16f3006ec0	Compute the ELF SectionKind from the flags. Any code creating an MCSectionELF knows ELF and already provides the flags. SectionKind is an abstraction used by common code that uses a plain MCSection. Use the flags to compute the SectionKind. This removes a lot of guessing and boilerplate from the MCSectionELF construction. llvm-svn: 227476	2015-01-29 17:33:21 +00:00
Bill Schmidt	f10debced6	[PowerPC] Complete setting the baseline for ppc64le Patch by Nemanja Ivanovic. As was uncovered by the failing test case (when run on non-PPC platforms), the feature set when compiling with -march=ppc64le was not being picked up. This change ensures that if the -mcpu option is not specified, the correct feature set is picked up regardless of whether we are on PPC or not. llvm-svn: 227455	2015-01-29 15:59:09 +00:00
Eric Christopher	aacfef65cf	Move DataLayout back to the TargetMachine from TargetSubtargetInfo derived classes. Since global data alignment, layout, and mangling is often based on the DataLayout, move it to the TargetMachine. This ensures that global data is going to be layed out and mangled consistently if the subtarget changes on a per function basis. Prior to this all targets() have had subtarget dependent code moved out and onto the TargetMachine. One target hasn't been migrated as part of this change: R600. The R600 port has, as a subtarget feature, the size of pointers and this affects global data layout. I've currently hacked in a FIXME to enable progress, but the port needs to be updated to either pass the 64-bitness to the TargetMachine, or fix the DataLayout to avoid subtarget dependent features. llvm-svn: 227113	2015-01-26 19:03:15 +00:00
Bill Schmidt	2fc9b38d49	[PowerPC] Reset the baseline for ppc64le to be equivalent to pwr8 Test by Nemanja Ivanovic. Since ppc64le implies POWER8 as a minimum, it makes sense that the same features are included. Since the pwr8 processor model will likely be getting new features until the implementation is complete, I created a new list to add these updates to. This will include them in both pwr8 and ppc64le. Furthermore, it seems that it would make sense to compose the feature lists for other processor models (pwr3 and up). Per discussion in the review, I will make this change in a subsequent patch. In order to test the changes, I've added an additional run step to test cases that specify -march=ppc64le -mcpu=pwr8 to omit the -mcpu option. Since the feature lists are the same, the behaviour should be unchanged. llvm-svn: 227053	2015-01-25 18:05:42 +00:00
Rafael Espindola	a02b7738f7	Add r224985 back with fixes. The fixes are to note that AArch64 has additional restrictions on when local relocations can be used. In particular, ld64 requires that relocations to cstring/cfstrings use linker visible symbols. Original message: In an assembly expression like bar: .long L0 + 1 the intended semantics is that bar will contain a pointer one byte past L0. In sections that are merged by content (strings, 4 byte constants, etc), a single position in the section doesn't give the linker enough information. For example, it would not be able to tell a relocation must point to the end of a string, since that would look just like the start of the next. The solution used in ELF to use relocation with symbols if there is a non-zero addend. In MachO before this patch we would just keep all symbols in some sections. This would miss some cases (only cstrings on x86_64 were implemented) and was inefficient since most relocations have an addend of 0 and can be represented without the symbol. This patch implements the non-zero addend logic for MachO too. llvm-svn: 226503	2015-01-19 21:11:14 +00:00
Hal Finkel	3eec8e9415	[PowerPC] Minor correction to r226432 We don't need to exclude patchpoints from the implicit r2 dependence in FastISel because it is added as an implicit operand and, thus, should not confuse that StackMap code. By inspection / no test case. llvm-svn: 226434	2015-01-19 07:44:45 +00:00
Hal Finkel	89328f88bf	[PowerPC] Add r2 as an operand for all calls under both PPC64 ELF V1 and V2 Our PPC64 ELF V2 call lowering logic added r2 as an operand to all direct call instructions in order to represent the dependency on the TOC base pointer value. Restricting this to ELF V2, however, does not seem to make sense: calls under ELF V1 have the same dependence, and indirect calls have an r2 dependence just as direct ones. Make sure the dependence is noted for all calls under both ELF V1 and ELF V2. llvm-svn: 226432	2015-01-19 07:20:27 +00:00
David Blaikie	8839ae1559	std::unique_ptrify the MCStreamer argument to createAsmPrinter llvm-svn: 226414	2015-01-18 20:29:04 +00:00
Hal Finkel	847797329b	[PowerPC] Don't hard-code R2 as register when processing TOC relocations Instructions that have high-order TOC relocations always carry R2 as their base register, so it does not matter whether we take the register from the instruction or just hard-code it in PPCAsmPrinter. In the future, however, we might want to apply these relocations to instructions using a different register, so taking the register from the instruction is a better thing to do. No change in functionality here, however. llvm-svn: 226403	2015-01-18 15:59:44 +00:00
Hal Finkel	0f5a2dae42	[PowerPC] Add some FIXMEs for fastcc and FPR <-> GPR moves So we don't forget, once we support FPR <-> GPR moves on the P8, we'll likely want to re-visit this part of the calling convention. llvm-svn: 226401	2015-01-18 14:31:10 +00:00
Hal Finkel	e9b3a1a030	[PowerPC] Initial PPC64 calling-convention changes for fastcc The default calling convention specified by the PPC64 ELF (V1 and V2) ABI is designed to work with both prototyped and non-prototyped/varargs functions. As a result, GPRs and stack space are allocated for every argument, even those that are passed in floating-point or vector registers. GlobalOpt::OptimizeFunctions will transform local non-varargs functions (that do not have their address taken) to use the 'fast' calling convention. When functions are using the 'fast' calling convention, don't allocate GPRs for arguments passed in other types of registers, and don't allocate stack space for arguments passed in registers. Other changes for the fast calling convention may be added in the future. llvm-svn: 226399	2015-01-18 12:08:47 +00:00
Chandler Carruth	c47432114d	[PM] Split the LoopInfo object apart from the legacy pass, creating a LoopInfoWrapperPass to wire the object up to the legacy pass manager. This switches all the clients of LoopInfo over and paves the way to port LoopInfo to the new pass manager. No functionality change is intended with this iteration. llvm-svn: 226373	2015-01-17 14:16:18 +00:00
Hal Finkel	470187b350	[PowerPC] Don't list R11 as a patchpoint scratch register R11's status is the same under both the PPC64 ELF V1 and V2 ABIs: it is reserved for use as an "environment pointer" for compilation models that require such a thing. We don't, we also don't need a second scratch register, and because we support only "local" patchpoint call targets, we might as well let R11 be used for anyregcc patchpoints. llvm-svn: 226369	2015-01-17 03:57:34 +00:00
Hal Finkel	04316a019c	[PowerPC] Adjust PatchPoints for ppc64le Bill Schmidt pointed out that some adjustments would be needed to properly support powerpc64le (using the ELF V2 ABI). For one thing, R11 is not available as a scratch register, so we need to use R12. R12 is also available under ELF V1, so to maintain consistency, I flipped the order to make R12 the first scratch register in the array under both ABIs. llvm-svn: 226247	2015-01-16 04:40:58 +00:00
Hal Finkel	dcf8b14857	[PowerPC] Loosen ELFv1 PPC64 func descriptor loads for indirect calls Function pointers under PPC64 ELFv1 (which is used on PPC64/Linux on the POWER7, A2 and earlier cores) are really pointers to a function descriptor, a structure with three pointers: the actual pointer to the code to which to jump, the pointer to the TOC needed by the callee, and an environment pointer. We used to chain these loads, and make them opaque to the rest of the optimizer, so that they'd always occur directly before the call. This is not necessary, and in fact, highly suboptimal on embedded cores. Once the function pointer is known, the loads can be performed ahead of time; in fact, they can be hoisted out of loops. Now these function descriptors are almost always generated by the linker, and thus the contents of the descriptors are invariant. As a result, by default, we'll mark the associated loads as invariant (allowing them to be hoisted out of loops). I've added a target feature to turn this off, however, just in case someone needs that option (constructing an on-stack descriptor, casting it to a function pointer, and then calling it cannot be well-defined C/C++ code, but I can imagine some JIT-compilation system doing so). Consider this simple test: $ cat call.c typedef void (fp)(); void bar(fp x) { for (int i = 0; i < 1600000000; ++i) x(); } $ cat main.c typedef void (fp)(); void bar(fp x); void foo() {} int main() { bar(foo); } On the PPC A2 (the BG/Q supercomputer), marking the function-descriptor loads as invariant brings the execution time down to ~8 seconds from ~32 seconds with the loads in the loop. The difference on the POWER7 is smaller. Compiling with: gcc -std=c99 -O3 -mcpu=native call.c main.c : ~6 seconds [this is 4.8.2] clang -O3 -mcpu=native call.c main.c : ~5.3 seconds clang -O3 -mcpu=native call.c main.c -mno-invariant-function-descriptors : ~4 seconds (looks like we'd benefit from additional loop unrolling here, as a first guess, because this is faster with the extra loads) The -mno-invariant-function-descriptors will be added to Clang shortly. llvm-svn: 226207	2015-01-15 21:17:34 +00:00
Chandler Carruth	88fd126216	[PM] Separate the TargetLibraryInfo object from the immutable pass. The pass is really just a means of accessing a cached instance of the TargetLibraryInfo object, and this way we can re-use that object for the new pass manager as its result. Lots of delta, but nothing interesting happening here. This is the common pattern that is developing to allow analyses to live in both the old and new pass manager -- a wrapper pass in the old pass manager emulates the separation intrinsic to the new pass manager between the result and pass for analyses. llvm-svn: 226157	2015-01-15 10:41:28 +00:00
Chandler Carruth	49a7633378	[PM] Move TargetLibraryInfo into the Analysis library. While the term "Target" is in the name, it doesn't really have to do with the LLVM Target library -- this isn't an abstraction which LLVM targets generally need to implement or extend. It has much more to do with modeling the various runtime libraries on different OSes and with different runtime environments. The "target" in this sense is the more general sense of a target of cross compilation. This is in preparation for porting this analysis to the new pass manager. No functionality changed, and updates inbound for Clang and Polly. llvm-svn: 226078	2015-01-15 02:16:27 +00:00
Hal Finkel	28c9c891ec	[PowerPC] Add assembler support for mcrfs and friends Fill out our support for the floating-point status and control register instructions (mcrfs and friends). As it turns out, these are necessary for compiling src/test/harness_fp.h in TBB for PowerPC. Thanks to Raf Schietekat for reporting the issue! llvm-svn: 226070	2015-01-15 01:00:53 +00:00
Bill Schmidt	6d9693229c	[PPC64] Add support for the ICBT instruction on POWER8. Patch by Kit Barton. Support for the ICBT instruction is currently present, but limited to embedded processors. This change adds a new FeatureICBT that can be used to identify whether the ICBT instruction is available on a specific processor. Two new tests are added: * Positive test to ensure the icbt instruction is present when using -mcpu=pwr8 * Negative test to ensure the icbt instruction is not generated when using -mcpu=pwr7 Both test cases use the Prefetch opcode in LLVM. They are based on the ppc64-prefetch.ll test case. llvm-svn: 226033	2015-01-14 20:17:10 +00:00
Rafael Espindola	24f46a1c22	Revert "Add r224985 back with two fixes." This reverts commit r225644 while I debug a regression. llvm-svn: 226022	2015-01-14 19:07:23 +00:00
Chandler Carruth	0b619fcc8e	[cleanup] Re-sort all the #include lines in LLVM using utils/sort_includes.py. I clearly haven't done this in a while, so more changed than usual. This even uncovered a missing include from the InstrProf library that I've added. No functionality changed here, just mechanical cleanup of the include order. llvm-svn: 225974	2015-01-14 11:23:27 +00:00
Hal Finkel	a11b7ea471	Revert "r225811 - Revert "r225808 - [PowerPC] Add StackMap/PatchPoint support"" This re-applies r225808, fixed to avoid problems with SDAG dependencies along with the preceding fix to ScheduleDAGSDNodes::RegDefIter::InitNodeNumDefs. These problems caused the original regression tests to assert/segfault on many (but not all) systems. Original commit message: This commit does two things: 1. Refactors PPCFastISel to use more of the common infrastructure for call lowering (this lets us take advantage of this common code for lowering some common intrinsics, stackmap/patchpoint among them). 2. Adds support for stackmap/patchpoint lowering. For the most part, this is very similar to the support in the AArch64 target, with the obvious differences (different registers, NOP instructions, etc.). The test cases are adapted from the AArch64 test cases. One difference of note is that the patchpoint call sequence takes 24 bytes, so you can't use less than that (on AArch64 you can go down to 16). Also, as noted in the docs, we take the patchpoint address to be the actual code address (assuming the call is local in the TOC-sharing sense), which should yield higher performance than generating the full cross-DSO indirect-call sequence and is likely just as useful for JITed code (if not, we'll change it). StackMaps and Patchpoints are still marked as experimental, and so this support is doubly experimental. So go ahead and experiment! llvm-svn: 225909	2015-01-14 01:07:51 +00:00
Ulrich Weigand	3ec182626d	Use the integrated assembler as default on PowerPC This was already done in clang, this commit now uses the integrated assembler as default when using LLVM tools directly. A number of test cases using inline asm had to be adapted, either by updating the expected output, or by using -no-integrated-as (for such tests that deliberately use an invalid instruction in inline asm). llvm-svn: 225819	2015-01-13 19:43:45 +00:00
Hal Finkel	c6fdfe466f	Revert "r225808 - [PowerPC] Add StackMap/PatchPoint support" Reverting this while I investiage buildbot failures (segfaulting in GetCostForDef at ScheduleDAGRRList.cpp:314). llvm-svn: 225811	2015-01-13 18:25:05 +00:00
Hal Finkel	8b0f03b1a3	[PowerPC] Add missing override keyword llvm-svn: 225809	2015-01-13 18:02:22 +00:00
Hal Finkel	ed17decbc6	[PowerPC] Add StackMap/PatchPoint support This commit does two things: 1. Refactors PPCFastISel to use more of the common infrastructure for call lowering (this lets us take advantage of this common code for lowering some common intrinsics, stackmap/patchpoint among them). 2. Adds support for stackmap/patchpoint lowering. For the most part, this is very similar to the support in the AArch64 target, with the obvious differences (different registers, NOP instructions, etc.). The test cases are adapted from the AArch64 test cases. One difference of note is that the patchpoint call sequence takes 24 bytes, so you can't use less than that (on AArch64 you can go down to 16). Also, as noted in the docs, we take the patchpoint address to be the actual code address (assuming the call is local in the TOC-sharing sense), which should yield higher performance than generating the full cross-DSO indirect-call sequence and is likely just as useful for JITed code (if not, we'll change it). StackMaps and Patchpoints are still marked as experimental, and so this support is doubly experimental. So go ahead and experiment! llvm-svn: 225808	2015-01-13 17:48:12 +00:00
Hal Finkel	155e99f749	[PowerPC] Split the blr definition into BLR and BLR8 We really need a separate 64-bit version of this instruction so that it can be marked as clobbering LR8 (instead of just LR). No change in functionality (although the verifier might be slightly happier), however, it is required for stackmap/patchpoint support. Thus, this will be covered by stackmap test cases once those are added. llvm-svn: 225804	2015-01-13 17:47:54 +00:00
Hal Finkel	512c8bea3a	[PowerPC] Add DWARF numbers for CA (XER), etc. For registers that have DWARF numbers (like CA, which is really part of XER), add them. Also, RM is not an SPR, and the declaration hack (where it is declared as an SPR with an arbitrary number) is not needed, so just declare it as a register. NFC; although CA's register number will be needed when stackmap/patchpoint support is added. llvm-svn: 225800	2015-01-13 17:45:11 +00:00
Olivier Sallenave	b0c1223b39	Added TLI hook for isFPExtFree. Some of the FMA combine heuristics are now guarded with that hook. llvm-svn: 225795	2015-01-13 15:06:36 +00:00
Rafael Espindola	b90e1b4a20	Add r224985 back with two fixes. One is that AArch64 has additional restrictions on when local relocations can be used. We have to take those into consideration when deciding to put a L symbol in the symbol table or not. The other is that ld64 requires the relocations to cstring to use linker visible symbols on AArch64. Thanks to Michael Zolotukhin for testing this! Remove doesSectionRequireSymbols. In an assembly expression like bar: .long L0 + 1 the intended semantics is that bar will contain a pointer one byte past L0. In sections that are merged by content (strings, 4 byte constants, etc), a single position in the section doesn't give the linker enough information. For example, it would not be able to tell a relocation must point to the end of a string, since that would look just like the start of the next. The solution used in ELF to use relocation with symbols if there is a non-zero addend. In MachO before this patch we would just keep all symbols in some sections. This would miss some cases (only cstrings on x86_64 were implemented) and was inefficient since most relocations have an addend of 0 and can be represented without the symbol. This patch implements the non-zero addend logic for MachO too. llvm-svn: 225644	2015-01-12 18:13:07 +00:00
Hal Finkel	08f3b7b05f	[PowerPC] Fix calls to non-function objects Looking at r225438 inspired me to see how the PowerPC backend handled the situation (calling a bitcasted TLS global), and it turns out we also produced an error (cannot select ...). What it means to "call" something that is not a function is implementation and platform specific, but in the name of doing something (besides crashing), this makes sure we do what GCC does (treat all such calls as calls through a function pointer -- meaning that the pointer is assumed, as is the convention on PPC, to point to a function descriptor structure holding the actual code address along with the function's TOC pointer and environment pointer). As GCC does, we now do the same for calling regular (non-TLS) non-function globals too. I'm not sure whether this is the most useful way to define the behavior, but at least we won't be alone. llvm-svn: 225617	2015-01-12 04:34:47 +00:00
Hal Finkel	0a750c5cd8	[PowerPC] Mark zext of a small scalar load as free This initial implementation of PPCTargetLowering::isZExtFree marks as free zexts of small scalar loads (that are not sign-extending). This callback is used by SelectionDAGBuilder's RegsForValue::getCopyToRegs, and thus to determine whether a zext or an anyext is used to lower illegally-typed PHIs. Because later truncates of zero-extended values are nops, this allows for the elimination of later unnecessary truncations. Fixes the initial complaint associated with PR22120. llvm-svn: 225584	2015-01-10 08:21:59 +00:00
Justin Hibbits	e74c9aa59c	Remove some whitespace. llvm-svn: 225583	2015-01-10 07:50:31 +00:00
Justin Hibbits	b4eb439b90	Fully fix Bug #22115 . Summary: In the previous commit, the register was saved, but space was not allocated. This resulted in the parameter save area potentially clobbering r30, leading to nasty results. Test Plan: Tests updated Reviewers: hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6906 llvm-svn: 225573	2015-01-10 01:57:21 +00:00
Hal Finkel	c563f95905	[PowerPC] Readjust the loop unrolling threshold Now that the way that the partial unrolling threshold for small loops is used to compute the unrolling factor as been corrected, a slightly smaller threshold is preferable. This is expected; other targets may need to re-tune as well. llvm-svn: 225566	2015-01-10 00:31:10 +00:00
Lang Hames	7363918430	Recommit r224935 with a fix for the ObjC++/AArch64 bug that that revision introduced. A test case for the bug was already committed in r225385. Patch by Rafael Espindola. llvm-svn: 225534	2015-01-09 18:55:42 +00:00
Hal Finkel	556331a037	[PowerPC] Enable late partial unrolling on the POWER7 The P7 benefits from not have really-small loops so that we either have multiple dispatch groups in the loop and/or the ability to form more-full dispatch groups during scheduling. Setting the partial unrolling threshold to 44 seems good, empirically, for the P7. Compared to using no late partial unrolling, this yields the following test-suite speedups: SingleSource/Benchmarks/Adobe-C++/simple_types_constant_folding -66.3253% +/- 24.1975% SingleSource/Benchmarks/Misc-C++/oopack_v1p8 -44.0169% +/- 29.4881% SingleSource/Benchmarks/Misc/pi -27.8351% +/- 12.2712% SingleSource/Benchmarks/Stanford/Bubblesort -30.9898% +/- 22.4647% I've speculatively added a similar setting for the P8. Also, I've noticed that the unroller does not quite calculate the unrolling factor correctly for really tiny loops because it neglects to account for the fact that not every loop body replicant contains an ending branch and counter increment. I'll fix that later. llvm-svn: 225522	2015-01-09 15:51:16 +00:00
Hal Finkel	9965c76405	[PowerPC] Add a flag for experimenting with subreg liveness tracking This cannot yet be enabled by default, it causes ~50 miscompiles in the test suite. llvm-svn: 225497	2015-01-09 02:03:11 +00:00
Hal Finkel	cfa765d60f	[PowerPC] Fold [sz]ext with fp_to_int lowering where possible On modern cores with lfiw[az]x, we can fold a sign or zero extension from i32 to i64 into the load necessary for an i64 -> fp conversion. llvm-svn: 225493	2015-01-09 01:34:30 +00:00
Hal Finkel	cf6cbfbe30	[PowerPC] Mark all instructions as non-cheap for MachineLICM MachineLICM uses a callback named hasLowDefLatency to determine if an instruction def operand has a 'low' latency. If all relevant operands have a 'low' latency, the instruction is considered too cheap to hoist out of loops even in low-register-pressure situations. On PowerPC cores, both the embedded cores and the others, there is no reason to believe that this is a good choice: all instructions have a cost inside a loop, and hoisting them when not limited by register pressure is a reasonable default. llvm-svn: 225471	2015-01-08 22:11:49 +00:00
Justin Hibbits	68fee020d6	Add saving and restoring of r30 to the prologue and epilogue, respectively Summary: The PIC additions didn't update the prologue and epilogue code to save and restore r30 (PIC base register). This does that. Test Plan: Tests updated. Reviewers: hfinkel Reviewed By: hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6876 llvm-svn: 225450	2015-01-08 15:47:19 +00:00
Ahmed Bougacha	4150499cd1	[SelectionDAG] Allow targets to specify legality of extloads' result type (in addition to the memory type). The LoadExt legalization handling used to only have one type, the memory type. This forced users to assume that as long as the extload for the memory type was declared legal, and the result type was legal, the whole extload was legal. However, this isn't always the case. For instance, on X86, with AVX, this is legal: v4i32 load, zext from v4i8 but this isn't: v4i64 load, zext from v4i8 Whereas v4i64 is (arguably) legal, even without AVX2. Note that the same thing was done a while ago for truncstores (r46140), but I assume no one needed it yet for extloads, so here we go. Calls to getLoadExtAction were changed to add the value type, found manually in the surrounding code. Calls to setLoadExtAction were mechanically changed, by wrapping the call in a loop, to match previous behavior. The loop iterates over the MVT subrange corresponding to the memory type (FP vectors, etc...). I also pulled neighboring setTruncStoreActions into some of the loops; those shouldn't make a difference, as the additional types are illegal. (e.g., i128->i1 truncstores on PPC.) No functional change intended. Differential Revision: http://reviews.llvm.org/D6532 llvm-svn: 225421	2015-01-08 00:51:32 +00:00
Ahmed Bougacha	4a6cab694b	[CodeGen] Use MVT iterator_ranges in legality loops. NFC intended. A few loops do trickier things than just iterating on an MVT subset, so I'll leave them be for now. Follow-up of r225387. llvm-svn: 225392	2015-01-07 21:27:10 +00:00
Hal Finkel	0392a6e252	[PowerPC] Transform a README.txt entry into a FIXME Remove the README.txt entry regarding register allocation of CR logical ops, and replace it with a FIXME in PPCInstrInfo.td. The text in the README.txt was not really accurate, and thanks goes to Pat Haugen (and Bill Schmidt) from IBM for clarifying what was intended and highlighting the relevant text in the ISA specification. llvm-svn: 225325	2015-01-07 00:15:29 +00:00
Lang Hames	7aa6a77beb	Revert r224935 "Refactor duplicated code. No intended functionality change." This is affecting the behavior of some ObjC++ / AArch64 test cases on Darwin. Reverting to get the bots green while I track down the source of the changed behavior. llvm-svn: 225311	2015-01-06 23:04:36 +00:00
Hal Finkel	930e5f41df	[PowerPC] Reuse a load operand in int->fp conversions int->fp conversions on PPC must be done through memory loads and stores. On a modern core, this process begins by storing the int value to memory, then loading it using a (sometimes special) FP load instruction. Unfortunately, we would do this even when the value to be converted was itself a load, and we can just use that same memory location instead of copying it to another first. There is a slight complication when handling int_to_fp(fp_to_int(x)) pairs, because the fp_to_int operand has not been lowered when the int_to_fp is being lowered. We handle this specially by invoking fp_to_int's lowering logic (partially) and getting the necessary memory location (some trivial refactoring was done to make this possible). This is all somewhat ugly, and it would be nice if some later CodeGen stage could just clean this stuff up, but because doing so would involve modifying target-specific nodes (or instructions), it is not immediately clear how that would work. Also, remove a related entry from the README.txt for which we now generate reasonable code. llvm-svn: 225301	2015-01-06 22:31:02 +00:00
Hal Finkel	b0f02c8870	[PowerPC] Remove old README.txt entry regarding struct passing Because of how Clang represents structs as arrays (at least on non-Darwin platforms), and what SROA does, etc. this is no longer a problem. llvm-svn: 225251	2015-01-06 07:23:13 +00:00
Hal Finkel	dd9d2801a3	[PowerPC] Add some missing names in getTargetNodeName These are used for debugging output; NFC. llvm-svn: 225249	2015-01-06 07:02:15 +00:00
Hal Finkel	165bd83ec9	[PowerPC] Improve int_to_fp(fp_to_int(x)) combining The old target DAG combine that allowed for performing int_to_fp(fp_to_int(x)) without a load/store pair is updated here with support for unsigned integers, and to support single-precision values without a third rounding step, on newer cores with the appropriate instructions. llvm-svn: 225248	2015-01-06 06:01:57 +00:00
Lang Hames	6337c82a54	Revert r225048: It broke ObjC on AArch64. I've filed http://llvm.org/PR22100 to track this issue. llvm-svn: 225228	2015-01-06 00:54:32 +00:00
Duncan P. N. Exon Smith	9d538d996b	Revert "Use the integrated assembler by default on 32-bit PowerPC and SPARC" This reverts commit r225213. It's failing on multiple buildbots [1][2]. [1]: http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/22032 [2]: http://lab.llvm.org:8080/green/view/Clang/job/clang-stage1-cmake-RA-incremental_check/2357/ llvm-svn: 225222	2015-01-05 23:31:51 +00:00
Hal Finkel	3ac5d36873	[PowerPC] Remove old README.txt entry We no longer generate horrible code for the stated function: void f(signed char a, _Bool b, _Bool c) { signed char t = 0; if (b) t = a; if (c) *a = t; } for which we now generate: .L.f: andi. 5, 5, 1 cmpldi 1, 4, 0 li 5, 0 beq 1, .LBB0_2 lbz 5, 0(3) .LBB0_2: # %if.end bclr 4, 1, 0 stb 5, 0(3) blr so we don't need the README.txt entry. llvm-svn: 225217	2015-01-05 22:20:22 +00:00
Hal Finkel	c2dfc1237c	[PowerPC] Convert a README.txt entry into a better test We now produce the desired code as noted in the README.txt file (no spurious or). Remove the README entry and improve the regression test. llvm-svn: 225214	2015-01-05 21:53:52 +00:00
Brad Smith	fcc570d0ff	Use the integrated assembler by default on 32-bit PowerPC and SPARC llvm-svn: 225213	2015-01-05 21:48:16 +00:00
Hal Finkel	ac0f30ff05	[PowerPC] Remove README.txt entry This entry has been rendered irrelevant now that we have proper CR bit tracking. llvm-svn: 225211	2015-01-05 21:41:26 +00:00
Hal Finkel	e17f6fa925	[PowerPC] Add a test for truncating a shifted load We now produce the desired code as noted in the README.txt file. Remove the README entry and add a regression test. llvm-svn: 225209	2015-01-05 21:33:14 +00:00
Hal Finkel	755b87536f	[PowerPC] Add another test for load/store with update We now produce the desired code as noted in the README.txt file. Remove the README entry and add a regression test. llvm-svn: 225205	2015-01-05 21:22:42 +00:00
Hal Finkel	feb56adb1b	[PowerPC] Fold i1 extensions with other ops Consider this function from our README.txt file: int foo(int a, int b) { return (a < b) << 4; } We now explicitly track CR bits by default, so the comment in the README.txt about not really having a SETCC is no longer accurate, but we did generate this somewhat silly code: cmpw 0, 3, 4 li 3, 0 li 12, 1 isel 3, 12, 3, 0 sldi 3, 3, 4 blr which generates the zext as a select between 0 and 1, and then shifts the result by a constant amount. Here we preprocess the DAG in order to fold the results of operations on an extension of an i1 value into the SELECT_I[48] pseudo instruction when the resulting constant can be materialized using one instruction (just like the 0 and 1). This was not implemented as a DAGCombine because the resulting code would have been anti-canonical and depends on replacing chained user nodes, which does not fit well into the lowering paradigm. Now we generate: cmpw 0, 3, 4 li 3, 0 li 12, 16 isel 3, 12, 3, 0 blr which is less silly. llvm-svn: 225203	2015-01-05 21:10:24 +00:00
Hal Finkel	21f185051b	[PowerPC] Remove zexts after i32 ctlz The 64-bit semantics of cntlzw are not special, the 32-bit population count is stored as a 64-bit value in the range [0,32]. As a result, it is always zero extended, and it can be added to the PPCISelDAGToDAG peephole optimization as a frontier instruction for the removal of unnecessary zero extensions. llvm-svn: 225192	2015-01-05 18:52:29 +00:00
Hal Finkel	c661e1aa45	[PowerPC] Remove zexts after byte-swapping loads lhbrx and lwbrx not only load their data with byte swapping, but also clear the upper 32 bits (at least). As a result, they can be added to the PPCISelDAGToDAG peephole optimization as frontier instructions for the removal of unnecessary zero extensions. llvm-svn: 225189	2015-01-05 18:09:06 +00:00
Hal Finkel	9dea823fc5	[PowerPC] Enable speculation of cttz/ctlz PPC has an instruction for ctlz with defined zero behavior, and our lowering of cttz (provided by DAGCombine) is also efficient and branchless, so speculating these makes sense. llvm-svn: 225150	2015-01-05 05:24:42 +00:00
Hal Finkel	4329a9427d	[PowerPC] Materialize i64 constants using rotation with masking r225135 added the ability to materialize i64 constants using rotations in order to reduce the instruction count. Sometimes we can use a rotation only with some extra masking, so that we take advantage of the fact that generating a bunch of extra higher-order 1 bits is easy using li/lis. llvm-svn: 225147	2015-01-05 03:41:38 +00:00
Hal Finkel	f842e57566	[PowerPC] Materialize i64 constants using rotation Materializing full 64-bit constants on PPC64 can be expensive, requiring up to 5 instructions depending on the locations of the non-zero bits. Sometimes materializing a rotated constant, and then applying the inverse rotation, requires fewer instructions than the direct method. If so, do that instead. In r225132, I added support for forming constants using bit inversion. In effect, this reverts that commit and replaces it with rotation support. The bit inversion is useful for turning constants that are mostly ones into ones that are mostly zeros (thus enabling a more-efficient shift-based materialization), but the same effect can be obtained by using negative constants and a rotate, and that is at least as efficient, if not more. llvm-svn: 225135	2015-01-04 15:43:55 +00:00
Hal Finkel	5219773596	[PowerPC] Materialize i64 constants using bit inversion Materializing full 64-bit constants on PPC64 can be expensive, requiring up to 5 instructions depending on the locations of the non-zero bits. Sometimes materializing the bit-reversed constant, and then flipping the bits, requires fewer instructions than the direct method. If so, do that instead. llvm-svn: 225132	2015-01-04 12:35:03 +00:00
Hal Finkel	2aea11b438	[PowerPC/BlockPlacement] Allow target to provide a per-loop alignment preference The existing code provided for specifying a global loop alignment preference. However, the preferred loop alignment might depend on the loop itself. For recent POWER cores, loops between 5 and 8 instructions should have 32-byte alignment (while the others are better with 16-byte alignment) so that the entire loop will fit in one i-cache line. To support this, getPrefLoopAlignment has been made virtual, and can be provided with an optional MachineLoop* so the target can inspect the loop before answering the query. The default behavior, as before, is to return the value set with setPrefLoopAlignment. MachineBlockPlacement now queries the target for each loop instead of only once per function. There should be no functional change for other targets. llvm-svn: 225117	2015-01-03 17:58:24 +00:00
Hal Finkel	bab4c16cdc	[PowerPC] Use 16-byte alignment for modern cores for functions/loops Most modern PowerPC cores prefer that functions and loops start on 16-byte-aligned boundaries (), so instruct block placement, etc. to make this happen. The branch selector has also been adjusted so account for the extra nops that might now be inserted before loop headers. () Some cores actually prefer other alignments for small loops, but that will be addressed in a follow-up commit. llvm-svn: 225115	2015-01-03 14:58:25 +00:00
Craig Topper	de4ba3b22a	Minor cleanup to all the switches after MatchInstructionImpl in all the AsmParsers. Make sure they all have llvm_unreachable on the default path out of the switch. Remove unnecessary "default: break". Remove a 'return' after unreachable. Fix some indentation. llvm-svn: 225114	2015-01-03 08:16:34 +00:00
Hal Finkel	fa0f576b41	[PowerPC] Add support for the CMPB instruction Newer POWER cores, and the A2, support the cmpb instruction. This instruction compares its operands, treating each of the 8 bytes in the GPRs separately, returning a 'mask' result of 0 (for false) or -1 (for true) in each byte. Code generation support is added, in the form of a PPCISelDAGToDAG DAG-preprocessing routine, that recognizes patterns close to what the instruction computes (either exactly, or related by a constant masking operation), and generates the cmpb instruction (along with any necessary constant masking operation). This can be expanded if use cases arise. llvm-svn: 225106	2015-01-03 01:16:37 +00:00
Hal Finkel	de1be0f87c	[PowerPC] use UINT64_C instead of ul Attempting to fix PR22078 (building on 32-bit systems) by replacing my careless use of 1ul to be a uint64_t constant with UINT64_C(1). llvm-svn: 225066	2015-01-01 19:33:59 +00:00
Hal Finkel	93997c9aa6	[PowerPC] Improve instruction selection bit-permuting operations (64-bit) This is the second installment of improvements to instruction selection for "bit permutation" instruction sequences. r224318 added logic for instruction selection for 32-bit bit permutation sequences, and this adds lowering for 64-bit sequences. The 64-bit sequences are more complicated than the 32-bit ones because: a) the 64-bit versions of the 32-bit rotate-and-mask instructions work by replicating the lower 32-bits of the value-to-be-rotated into the upper 32 bits -- and integrating this into the cost modeling for the various bit group operations is non-trivial b) unlike the 32-bit instructions in 32-bit mode, the rotate-and-mask instructions cannot, in one instruction, specify the mask starting index, the mask ending index, and the rotation factor. Also, forming arbitrary 64-bit constants is more complicated than in 32-bit mode because the number of instructions necessary is value dependent. Plus, support for 'late masking' was added: it is sometimes more efficient to treat the overall value as if it had no mandatory zero bits when planning the bit-group insertions, and then mask them in at the very end. Unfortunately, as the structure of the bit groups is different in the two cases, the more feasible implementation technique was to generate both instruction sequences, and then pick the shorter one. And finally, we now generate reasonable code for i64 bswap: rldicl 5, 3, 16, 0 rldicl 4, 3, 8, 0 rldicl 6, 3, 24, 0 rldimi 4, 5, 8, 48 rldicl 5, 3, 32, 0 rldimi 4, 6, 16, 40 rldicl 6, 3, 48, 0 rldimi 4, 5, 24, 32 rldicl 5, 3, 56, 0 rldimi 4, 6, 40, 16 rldimi 4, 5, 48, 8 rldimi 4, 3, 56, 0 vs. what we used to produce: li 4, 255 rldicl 5, 3, 24, 40 rldicl 6, 3, 40, 24 rldicl 7, 3, 56, 8 sldi 8, 3, 8 sldi 10, 3, 24 sldi 12, 3, 40 rldicl 0, 3, 8, 56 sldi 9, 4, 32 sldi 11, 4, 40 sldi 4, 4, 48 andi. 5, 5, 65280 andis. 6, 6, 255 andis. 7, 7, 65280 sldi 3, 3, 56 and 8, 8, 9 and 4, 12, 4 and 9, 10, 11 or 6, 7, 6 or 5, 5, 0 or 3, 3, 4 or 7, 9, 8 or 4, 6, 5 or 3, 3, 7 or 3, 3, 4 which is 12 instructions, instead of 25, and seems optimal (at least in terms of code size). llvm-svn: 225056	2015-01-01 02:53:29 +00:00
Rafael Espindola	13ff8033c2	Add r224985 back with a fix. The issues was that AArch64 has additional restrictions on when local relocations can be used. We have to take those into consideration when deciding to put a L symbol in the symbol table or not. Original message: Remove doesSectionRequireSymbols. In an assembly expression like bar: .long L0 + 1 the intended semantics is that bar will contain a pointer one byte past L0. In sections that are merged by content (strings, 4 byte constants, etc), a single position in the section doesn't give the linker enough information. For example, it would not be able to tell a relocation must point to the end of a string, since that would look just like the start of the next. The solution used in ELF to use relocation with symbols if there is a non-zero addend. In MachO before this patch we would just keep all symbols in some sections. This would miss some cases (only cstrings on x86_64 were implemented) and was inefficient since most relocations have an addend of 0 and can be represented without the symbol. This patch implements the non-zero addend logic for MachO too. llvm-svn: 225048	2014-12-31 17:19:34 +00:00
Rafael Espindola	afd829c72b	Revert "Remove doesSectionRequireSymbols." This reverts commit r224985. I am investigating why it made an Apple bot unhappy. llvm-svn: 225044	2014-12-31 16:06:48 +00:00
Rafael Espindola	1db8d30b1f	Remove doesSectionRequireSymbols. In an assembly expression like bar: .long L0 + 1 the intended semantics is that bar will contain a pointer one byte past L0. In sections that are merged by content (strings, 4 byte constants, etc), a single position in the section doesn't give the linker enough information. For example, it would not be able to tell a relocation must point to the end of a string, since that would look just like the start of the next. The solution used in ELF to use relocation with symbols if there is a non-zero addend. In MachO before this patch we would just keep all symbols in some sections. This would miss some cases (only cstrings on x86_64 were implemented) and was inefficient since most relocations have an addend of 0 and can be represented without the symbol. This patch implements the non-zero addend logic for MachO too. llvm-svn: 224985	2014-12-30 13:13:27 +00:00
Rafael Espindola	65834dfd2d	Refactor duplicated code. No intended functionality change. llvm-svn: 224935	2014-12-29 15:18:31 +00:00
David Majnemer	24c569b81d	PowerPC: CTR shouldn't fire if a TLS call is in the loop Determining the address of a TLS variable results in a function call in certain TLS models. This means that a simple ICmpInst might actually result in invalidating the CTR register. In such cases, do not attempt to rely on the CTR register for loop optimization purposes. This fixes PR22034. Differential Revision: http://reviews.llvm.org/D6786 llvm-svn: 224890	2014-12-27 19:45:38 +00:00
Hal Finkel	9f196683c0	[PowerPC] [FastISel] i1 constants must be zero extended When materializing constant i1 values, they must be zero extended. We represent i1 values as [0, 1], not [0, -1], in i32 registers. As it turns out, this code path was dead for i1 values prior to r216006 (which is why this did not manifest in miscompiles until recently). Fixes -O0 self-hosting on PPC64/Linux. llvm-svn: 224842	2014-12-25 23:08:25 +00:00
Hal Finkel	e37bd38154	[PowerPC] Ensure that the TOC reload directly follows bctrl on PPC64 On non-Darwin PPC64, the TOC reload needs to come directly after the bctrl instruction (for indirect calls) because the 'bctrl/ld 2, 40(1)' instruction sequence is interpreted by the unwinding code in libgcc. To make sure these occur as a pair, as with other pairings interpreted by the linker, fuse the two instructions into one instruction (for code generation only). In the future, we might wish to do this by emitting CFI directives instead, but this solution is simpler, and mirrors what GCC does. Additional discussion on this point is contained in the PR. Fixes PR22015. llvm-svn: 224788	2014-12-23 22:29:40 +00:00
Hal Finkel	f0195675c6	[PowerPC] Don't mark the return-address slot as immutable It is tempting to mark the fixed stack slot used to store the return address as immutable when lowering @llvm.returnaddress(i32 0). Unfortunately, within the function, it is not completely immutable: it is written during the function prologue. When using post-RA instruction scheduling, the prologue instructions are available for scheduling, and we're not free to interchange the order of a particular store in the prologue with loads from that stack location. Fixes PR21976. llvm-svn: 224761	2014-12-23 09:45:06 +00:00
Hal Finkel	867d174942	[PowerPC] Don't attempt a 64-bit pow2 division on PPC32 In r224033, in moving the signed power-of-2 division expansion into BuildSDIVPow2, I accidentally made it possible to attempt the lowering for a 64-bit division on PPC32. This later asserts. Fixes PR21928. llvm-svn: 224758	2014-12-23 08:38:50 +00:00
Craig Topper	f49fac3203	[PowerPC] Use MCPhysReg for tables of registers. Const-correct the tables. Only put the anonymous namespace around classes. NFC. llvm-svn: 224498	2014-12-18 05:02:14 +00:00
Will Schmidt	68f6e5d89e	Enable the P8Model entry This was missed last time around, for the P8 Instruction Scheduling changes (223257). This will hook the P8Model entry in so those changes will actually be used. llvm-svn: 224452	2014-12-17 19:56:29 +00:00
Hal Finkel	04ae4c36c5	[PowerPC] Improve instruction selection bit-permuting operations (32-bit) The PowerPC backend, somewhat embarrassingly, did not generate an optimal-length sequence of instructions for a 32-bit bswap. While adding a pattern for the bswap intrinsic to fix this would not have been terribly difficult, doing so would not have addressed the real problem: we had been generating poor code for many bit-permuting operations (by which I mean things like byte-swap that permute the bits of one or more inputs around in various ways). Here are some initial steps toward solving this deficiency. Bit-permuting operations are represented, at the SDAG level, using ISD::ROTL, SHL, SRL, AND and OR (mostly with constant second operands). Looking back through these operations, we can build up a description of the bits in the resulting value in terms of bits of one or more input values (and constant zeros). For each bit, we compute the rotation amount from the original value, and then group consecutive (value, rotation factor) bits into groups. Groups sharing these attributes are then collected and sorted, and we can then instruction select the entire permutation using a combination of masked rotations (rlwinm), imm ands (andi/andis), and masked rotation inserts (rlwimi). The result is that instead of lowering an i32 bswap as: rlwinm 5, 3, 24, 16, 23 rlwinm 4, 3, 24, 0, 7 rlwimi 4, 3, 8, 8, 15 rlwimi 5, 3, 8, 24, 31 rlwimi 4, 5, 0, 16, 31 we now produce: rlwinm 4, 3, 8, 0, 31 rlwimi 4, 3, 24, 16, 23 rlwimi 4, 3, 24, 0, 7 and for the 'test6' example in the PowerPC/README.txt file: unsigned test6(unsigned x) { return ((x & 0x00FF0000) >> 16) \| ((x & 0x000000FF) << 16); } we used to produce: lis 4, 255 rlwinm 3, 3, 16, 0, 31 ori 4, 4, 255 and 3, 3, 4 and now we produce: rlwinm 4, 3, 16, 24, 31 rlwimi 4, 3, 16, 8, 15 and, as a nice bonus, this fixes the FIXME in test/CodeGen/PowerPC/rlwimi-and.ll. This commit does not include instruction-selection for i64 operations, those will come later. llvm-svn: 224318	2014-12-16 05:51:41 +00:00
Hal Finkel	acf8e7a584	[PowerPC] Handle cmp op promotion for SELECT[_CC] nodes in PPCTL::DAGCombineExtBoolTrunc PPCTargetLowering::DAGCombineExtBoolTrunc contains logic to remove unwanted truncations and extensions when dealing with nodes of the form: zext(binary-ops(binary-ops(trunc(x), trunc(y)), ...) There was a FIXME in the implementation (now removed) regarding the fact that the function would abort the transformations if any of the non-output operands of a SELECT or SELECT_CC node would need to be promoted (because they were also output operands, for example). As a result, we continued to generate unnecessary zero-extends for code such as this: unsigned foo(unsigned a, unsigned b) { return (a <= b) ? a : b; } which would produce: cmplw 0, 3, 4 isel 3, 4, 3, 1 rldicl 3, 3, 0, 32 blr and now we produce: cmplw 0, 3, 4 isel 3, 4, 3, 1 blr which is better in the obvious way. llvm-svn: 224213	2014-12-14 05:53:19 +00:00
Hal Finkel	30da0a42c8	[PowerPC] Add a DAGToDAG peephole to remove unnecessary zero-exts On PPC64, we end up with lots of i32 -> i64 zero extensions, not only from all of the usual places, but also from the ABI, which specifies that values passed are zero extended. Almost all 32-bit PPC instructions in PPC64 mode are defined to do something to the higher-order bits, and for some instructions, that action clears those bits (thus providing a zero-extended result). This is especially common after rotate-and-mask instructions. Adding an additional instruction to zero-extend the results of these instructions is unnecessary. This PPCISelDAGToDAG peephole optimization examines these zero-extensions, and looks back through their operands to see if all instructions will implicitly zero extend their results. If so, we convert these instructions to their 64-bit variants (which is an internal change only, the actual encoding of these instructions is the same as the original 32-bit ones) and remove the unnecessary zero-extension (changing where the INSERT_SUBREG instructions are to make everything internally consistent). llvm-svn: 224169	2014-12-12 23:59:36 +00:00
Hal Finkel	1b92efa70e	[PowerPC] Better lowering for add/or of a FrameIndex If we have an add (or an or that is really an add), where one operand is a FrameIndex and the other operand is a small constant, we can combine the lowering of the FrameIndex (which is lowered as an add of the FI and a zero offset) with the constant operand. Amusingly, this is an old potential improvement entry from lib/Target/PowerPC/README.txt which had never been resolved. In short, we used to lower: %X = alloca { i32, i32 } %Y = getelementptr {i32,i32}* %X, i32 0, i32 1 ret i32* %Y as: addi 3, 1, -8 ori 3, 3, 4 blr and now we produce: addi 3, 1, -4 blr which is much more sensible. llvm-svn: 224071	2014-12-11 22:51:06 +00:00
Matthias Braun	aa888a6f1e	[CodeGen] Add print and verify pass after each MachineFunctionPass by default Previously print+verify passes were added in a very unsystematic way, which is annoying when debugging as you miss intermediate steps and allows bugs to stay unnotice when no verification is performed. To make this change practical I added the possibility to explicitely disable verification. I used this option on all places where no verification was performed previously (because alot of places actually don't pass the MachineVerifier). In the long term these problems should be fixed properly and verification enabled after each pass. I'll enable some more verification in subsequent commits. This is the 2nd attempt at this after realizing that PassManager::add() may actually delete the pass. llvm-svn: 224059	2014-12-11 21:26:47 +00:00
Rafael Espindola	aa48306a03	This reverts commit r224043 and r224042. check-llvm was failing. llvm-svn: 224045	2014-12-11 20:03:57 +00:00
Matthias Braun	42e36608f0	[CodeGen] Add print and verify pass after each MachineFunctionPass by default Previously print+verify passes were added in a very unsystematic way, which is annoying when debugging as you miss intermediate steps and allows bugs to stay unnotice when no verification is performed. To make this change practical I added the possibility to explicitely disable verification. I used this option on all places where no verification was performed previously (because alot of places actually don't pass the MachineVerifier). In the long term these problems should be fixed properly and verification enabled after each pass. I'll enable some more verification in subsequent commits. llvm-svn: 224042	2014-12-11 19:42:05 +00:00
Hal Finkel	f4a8d09521	[PowerPC] Implement BuildSDIVPow2, lower i64 pow2 sdiv using sradi PPCISelDAGToDAG contained existing code to lower i32 sdiv by a power-of-2 using srawi/addze, but did not implement the i64 case. DAGCombine now contains a callback specifically designed for this purpose (BuildSDIVPow2), and part of the logic has been moved to an implementation of that callback. Doing this lowering using BuildSDIVPow2 likely does not matter, compared to handling everything in PPCISelDAGToDAG, for the positive divisor case, but the negative divisor case, which generates an additional negation, can potentially benefit from additional folding from DAGCombine. Now, both the i32 and the i64 cases have been implemented. Fixes PR20732. llvm-svn: 224033	2014-12-11 18:37:52 +00:00
Bill Schmidt	41e95a89e4	[PowerPC 4/4] Enable little-endian support for VSX. With the foregoing three patches, VSX instructions can be used for little endian. This patch removes the restriction that prevented this, and re-enables the test cases from the first three patches. llvm-svn: 223792	2014-12-09 16:59:57 +00:00
Bill Schmidt	015b0526cc	[PowerPC 3/4] Little-endian adjustments for VSX vector shuffle When performing instruction selection for ISD::VECTOR_SHUFFLE, there is special code for handling v2f64 and v2i64 using VSX instructions. This code must be adjusted for little-endian. Because the two inputs are treated as a double-wide register, we must swap their order for little endian. To get the appropriate mask elements to use with the big-endian biased XXPERMDI instruction, we must reverse their order and invert the bits. A new test is added to test the 16 possible values of the shuffle mask. It is initially disabled for reasons specified in the test. It is re-enabled by patch 4/4. llvm-svn: 223791	2014-12-09 16:52:29 +00:00
Bill Schmidt	8456365c7d	[PowerPC 2/4] Little-endian adjustments for VSX insert/extract operations For little endian, we need to make some straightforward adjustments in the code expansions for scalar_to_vector and vector_extract of v2f64. First, scalar_to_vector must place the scalar into vector element zero. However, our implementation of SUBREG_TO_REG will place it into big-element vector element zero (high-order bits), and for little endian we need it in the low-order bits. The LE implementation splats the high-order doubleword into the low-order doubleword. Second, the meaning of (vector_extract x, 0) and (vector_extract x, 1) must be reversed for similar reasons. A new test is added that tests code generation for insertelement and extractelement for both element 0 and element 1. It is disabled in this patch but enabled in patch 4/4, for reasons stated in the test. llvm-svn: 223788	2014-12-09 16:43:32 +00:00
Bill Schmidt	cb25653e71	[PowerPC 1/4] Little-endian adjustments for VSX loads/stores This patch addresses the inherent big-endian bias in the lxvd2x, lxvw4x, stxvd2x, and stxvw4x instructions. These instructions load vector elements into registers left-to-right (with the first element loaded into the high-order bits of the register), regardless of the endian setting of the processor. However, these are the only vector memory instructions that permit unaligned storage accesses, so we want to use them for little-endian. To make this work, a lxvd2x or lxvw4x is replaced with an lxvd2x followed by an xxswapd, which swaps the doublewords. This works for lxvw4x as well as lxvd2x, because for lxvw4x on an LE system the vector elements are in LE order (right-to-left) within each doubleword. (Thus after lxvw2x of a <4 x float> the elements will appear as 1, 0, 3, 2. Following the swap, they will appear as 3, 2, 0, 1, as desired.) For stores, an stxvd2x or stxvw4x is replaced with an stxvd2x preceded by an xxswapd. Introduction of extra swap instructions provides correctness, but obviously is not ideal from a performance perspective. Future patches will address this with optimizations to remove most of the introduced swaps, which have proven effective in other implementations. The introduction of the swaps is performed during lowering of LOAD, STORE, INTRINSIC_W_CHAIN, and INTRINSIC_VOID operations. The latter are used to translate intrinsics that specify the VSX loads and stores directly into equivalent sequences for little endian. Thus code that uses vec_vsx_ld and vec_vsx_st does not have to be modified to be ported from BE to LE. We introduce new PPCISD opcodes for LXVD2X, STXVD2X, and XXSWAPD for use during this lowering step. In PPCInstrVSX.td, we add new SDType and SDNode definitions for these (PPClxvd2x, PPCstxvd2x, PPCxxswapd). These are recognized during instruction selection and mapped to the correct instructions. Several tests that were written to use -mcpu=pwr7 or pwr8 are modified to disable VSX on LE variants because code generation changes with this and subsequent patches in this set. I chose to include all of these in the first patch than try to rigorously sort out which tests were broken by one or another of the patches. Sorry about that. The new test vsx-ldst-builtin-le.ll, and the changes to vsx-ldst.ll, are disabled until LE support is enabled because of breakages that occur as noted in those tests. They are re-enabled in patch 4/4. llvm-svn: 223783	2014-12-09 16:35:51 +00:00
Bill Schmidt	e2ee779bbb	Restore r223709 as it was meant to be, and enable FeatureP8Vector for P8 llvm-svn: 223751	2014-12-09 03:02:48 +00:00
NAKAMURA Takumi	12ce662a42	Revert r223709, "[PowerPC]Activate FeatureVSX for the Power target", to unbreak bots. CodeGen/PowerPC/vsx-p8.ll was failing. '+power8-vector' is not a recognized feature for this target (ignoring feature) llvm/test/CodeGen/PowerPC/vsx-p8.ll:33:14: error: expected string not found in input ; CHECK-REG: lxvw4x 34, 0, 3 ^ <stdin>:50:2: note: scanning from here .align 3 ^ <stdin>:61:2: note: possible intended match here lvx 3, 0, 3 ^ llvm-svn: 223729	2014-12-09 01:03:27 +00:00
Bill Seurer	132393f6f8	[PowerPC]Activate FeatureVSX for the Power target This change activates FeatureVSX for Power 7 and Power 8 in PPC.td. http://reviews.llvm.org/D6570 llvm-svn: 223709	2014-12-08 23:07:12 +00:00
Hal Finkel	494145ce57	[PowerPC] Don't use a non-allocatable register to implement the 'cc' alias GCC accepts 'cc' as an alias for 'cr0', and we need to do the same when processing inline asm constraints. This had previously been implemented using a non-allocatable register, named 'cc', that was listed as an alias of 'cr0', but the infrastructure does not seem to support this properly (neither the register allocator nor the scheduler properly accounts for the alias). Instead, we can just process this as a naming alias inside of the inline asm constraint-processing code, so we'll do that instead. There are two regression tests, one where the post-RA scheduler did the wrong thing with the non-allocatable alias, and one where the register allocator did the wrong thing. Fixes PR21742. llvm-svn: 223708	2014-12-08 22:54:22 +00:00
Bill Seurer	b4d665d454	[PowerPC]Add VSX loads/stores to fastisel for PPC target This patch adds VSX floating point loads and stores to fastisel. Along with the change to tablegen (D6220), VSX instructions are now fully supported in fastisel. http://reviews.llvm.org/D6274 llvm-svn: 223507	2014-12-05 20:15:56 +00:00
Hal Finkel	84b0b8cc3b	[PowerPC] 'cc' should be an alias only to 'cr0' We had mistakenly believed that GCC's 'cc' referred to the entire condition-code register (cr0 through cr7) -- and implemented this in r205630 to fix PR19326, but 'cc' is actually an alias only to 'cr0'. This is causing LLVM to clobber too much with legacy code with inline asm using the 'cc' clobber. Fixes PR21451. llvm-svn: 223328	2014-12-04 00:46:20 +00:00
Hal Finkel	bf150eceab	[PowerPC] Fix inline asm memory operands not to use r0 On PowerPC, inline asm memory operands might be expanded as 0($r), where $r is a register containing the address. As a result, this register cannot be r0, and we need to enforce this register subclass constraint to prevent miscompiling the code (we'd get this constraint for free with the usual instruction definitions, but that scheme has no knowledge of how we end up printing inline asm memory operands, and so here we need to do it 'by hand'). We can accomplish this within the current address-mode selection framework by introducing an explicit COPY_TO_REGCLASS node. Fixes PR21443. llvm-svn: 223318	2014-12-03 23:40:13 +00:00
Will Schmidt	67ea953093	Add TableGen info for Power8. This is based on the Power7 version, with units added and renamed to match P8. Differential Revision: http://reviews.llvm.org/D6358 llvm-svn: 223257	2014-12-03 18:46:30 +00:00
Hal Finkel	e95528845d	[PowerPC] Print all inline-asm consts as signed numbers Almost all immediates in PowerPC assembly (both 32-bit and 64-bit) are signed numbers, and it is important that we print them as such. To make sure that happens, we change PPCTargetLowering::LowerAsmOperandForConstraint so that it does all intermediate checks on a signed-extended int64_t value, and then creates the resulting target constant using MVT::i64. This will ensure that all negative values are printed as negative values (mirroring what is done in other backends to achieve the same sign-extension effect). This came up in the context of inline assembly like this: "add%I2 %0,%0,%2", ..., "Ir"(-1ll) where we used to print: addi 3,3,4294967295 and gcc would print: addi 3,3,-1 and gas accepts both forms, but our builtin assembler (correctly) does not. Now we print -1 like gcc does. While here, I replaced a bunch of custom integer checks with isInt<16> and friends from MathExtras.h. Thanks to Paul Hargrove for the bug report. llvm-svn: 223220	2014-12-03 09:37:50 +00:00
Hal Finkel	2b306926ed	[PowerPC] Fix readcyclecounter to be custom expanded for all 32-bit targets We need to use the custom expansion of readcyclecounter on all 32-bit targets (even those with 64-bit registers). This should fix the ppc64 buildbot. llvm-svn: 223182	2014-12-03 00:19:17 +00:00
Hal Finkel	337f550328	[PowerPC] Implement readcyclecounter for PPC32 We've long supported readcyclecounter on PPC64, but it is easier there (the read of the 64-bit time-base register can be accomplished via a single instruction). This now provides an implementation for PPC32 as well. On PPC32, the time-base register is still 64 bits, but can only be read 32 bits at a time via two separate SPRs. The ISA manual explains how to do this properly (it involves re-reading the upper bits and looping if the counter has wrapped while being read). This requires PPC to implement a custom integer splitting legalization for the READCYCLECOUNTER node, turning it into a target-specific SDAG node, which then gets turned into a pseudo-instruction, which is then expanded to the necessary sequence (which has three SPR reads, the comparison and the branch). Thanks to Paul Hargrove for pointing out to me that this was still unimplemented. llvm-svn: 223161	2014-12-02 22:01:00 +00:00
Jay Foad	1e2a95bd74	[PowerPC] Fix unwind info with dynamic stack realignment Summary: PowerPC DWARF unwind info defined CFA as SP + offset even in a function where the stack had been dynamically realigned. This clearly doesn't work because the offset from SP to CFA is not a constant. Fix it by defining CFA as BP instead. This was causing the AddressSanitizer null_deref test to fail 50% of the time, depending on whether SP happened to be 32-byte aligned on entry to a particular function or not. Reviewers: willschm, uweigand, hfinkel Reviewed By: hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6410 llvm-svn: 222996	2014-12-01 09:42:32 +00:00
Hal Finkel	efd08ad91f	[PowerPC] Add asm support for cache-inhibited ld/st instructions Add assembler support for the fixed-point cache-inhibited load/store instructions. These are hypervisor-level only, so don't get too excited ;) Fixes PR21650. llvm-svn: 222976	2014-11-30 10:15:56 +00:00
Simon Pilgrim	3d5273ea80	Target triple OS detection tidyup. NFC Use Triple::isOS*() helpers where possible. llvm-svn: 222960	2014-11-29 19:18:21 +00:00
Craig Topper	0734168db8	Replace neverHasSideEffects=1 with hasSideEffects=0 in all .td files. llvm-svn: 222801	2014-11-26 00:46:26 +00:00
Hal Finkel	d19aa4cdf8	[PowerPC] Add the 'attn' instruction The attn instruction is not part of the Power ISA, but is documented in the A2 user manual, and is accepted by the GNU assembler for the A2 and the POWER4+. Reported as part of PR21650. llvm-svn: 222712	2014-11-25 00:30:11 +00:00
Hal Finkel	515f6e50f5	[PowerPC] Implement combineRepeatedFPDivisors This does not matter on newer cores (where we can use reciprocal estimates in fast-math mode anyway), but for older cores this allows us to generate better fast-math code where we have multiple FDIVs with a common divisor. llvm-svn: 222710	2014-11-24 23:45:21 +00:00
Ulrich Weigand	24b899d017	[PowerPC] Fix PR 21652 - copy st_other bits on symbol assignment When processing an assignment in the integrated assembler that sets a symbol to the value of another symbol, we need to copy the st_other bits that encode the local entry point offset. Modeled after MipsTargetELFStreamer::emitAssignment handling of the ELF::STO_MIPS_MICROMIPS flag. llvm-svn: 222672	2014-11-24 18:09:47 +00:00
NAKAMURA Takumi	3cd455da60	Add LLVMScalarOpts to LLVMPowerPCCodeGen. llvm-svn: 222516	2014-11-21 09:14:45 +00:00
Craig Topper	45dffff5e4	Remove a bunch of unnecessary typecasts to 'const TargetRegisterClass *' llvm-svn: 222509	2014-11-21 05:58:21 +00:00
Hal Finkel	ac26448a5c	[PPC] Use SeparateConstOffsetFromGEP This mirrors r222331, which enabled SeparateConstOffsetFromGEP on AArch64, in the PowerPC backend. Yields, on a POWER7 machine, a 30% speedup on SingleSource/Benchmarks/Shootout/nestedloop (this might just be from LICM, there is a store moved out of the inner loop) and a potential speedup on MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode. Regardless, it makes some code look cleaner, and synchronizing the backends in this regard seems like a generally good thing. llvm-svn: 222504	2014-11-21 04:35:51 +00:00
Reid Kleckner	6a21619ebc	Add out of line virtual destructors to all LLVMTargetMachine subclasses These recently all grew a unique_ptr<TargetLoweringObjectFile> member in r221878. When anyone calls a virtual method of a class, clang-cl requires all virtual methods to be semantically valid. This includes the implicit virtual destructor, which triggers instantiation of the unique_ptr destructor, which fails because the type being deleted is incomplete. This is just part of the ongoing saga of PR20337, which is affecting Blink as well. Because the MSVC ABI doesn't have key functions, we end up referencing the vtable and implicit destructor on any virtual call through a class. We don't actually end up emitting the dtor, so it'd be good if we could avoid this unneeded type completion work. llvm-svn: 222480	2014-11-20 23:37:18 +00:00
David Blaikie	60e6c80905	Update SetVector to rely on the underlying set's insert to return a pair<iterator, bool> This is to be consistent with StringSet and ultimately with the standard library's associative container insert function. This lead to updating SmallSet::insert to return pair<iterator, bool>, and then to update SmallPtrSet::insert to return pair<iterator, bool>, and then to update all the existing users of those functions... llvm-svn: 222334	2014-11-19 07:49:26 +00:00
Bill Schmidt	80cd5e5dbb	[PowerPC] Add VSX builtins for vec_div This patch adds builtin support for xvdivdp and xvdivsp, along with a test case. Straightforward stuff. There's a companion patch for Clang. llvm-svn: 221983	2014-11-14 12:10:40 +00:00
Aditya Nandakumar	4d9c1ff994	We can get the TLOF from the TargetMachine - so constructor no longer requires TargetLoweringObjectFile to be passed. llvm-svn: 221926	2014-11-13 21:29:21 +00:00
Aditya Nandakumar	b93fb292df	This patch changes the ownership of TLOF from TargetLoweringBase to TargetMachine so that different subtargets could share the TLOF effectively llvm-svn: 221878	2014-11-13 09:26:31 +00:00
Justin Hibbits	586b669cf7	Add support for small-model PIC for PowerPC. Summary: Large-model was added first. With the addition of support for multiple PIC models in LLVM, now add small-model PIC for 32-bit PowerPC, SysV4 ABI. This generates more optimal code, for shared libraries with less than about 16380 data objects. Test Plan: Test cases added or updated Reviewers: joerg, hfinkel Reviewed By: hfinkel Subscribers: jholewinski, mcrosier, emaste, llvm-commits Differential Revision: http://reviews.llvm.org/D5399 llvm-svn: 221791	2014-11-12 15:16:30 +00:00
Bill Schmidt	96b68de282	[PowerPC] Add vec_vsx_ld and vec_vsx_st intrinsics This patch enables the vec_vsx_ld and vec_vsx_st intrinsics for PowerPC, which provide programmer access to the lxvd2x, lxvw4x, stxvd2x, and stxvw4x instructions. New LLVM intrinsics are provided to represent these four instructions in IntrinsicsPowerPC.td. These are patterned after the similar intrinsics for lvx and stvx (Altivec). In PPCInstrVSX.td, these intrinsics are tied to the code gen patterns, with additional patterns to allow plain vanilla loads and stores to still generate these instructions. At -O1 and higher the intrinsics are immediately converted to loads and stores in InstCombineCalls.cpp. This will open up more optimization opportunities while still allowing the correct instructions to be generated. (Similar code exists for aligned Altivec loads and stores.) The new intrinsics are added to the code that checks for consecutive loads and stores in PPCISelLowering.cpp, as well as to PPCTargetLowering::getTgtMemIntrinsic(). There's a new test to verify the correct instructions are generated. The loads and stores tend to be reordered, so the test just counts their number. It runs at -O2, as it's not very effective to test this at -O0, when many unnecessary loads and stores are generated. I ended up having to modify vsx-fma-m.ll. It turns out this test case is slightly unreliable, but I don't know a good way to prevent problems with it. The xvmaddmdp instructions read and write the same register, which is one of the multiplicands. Commutativity allows either to be chosen. If the FMAs are reordered differently than expected by the test, the register assignment can be different as a result. Hopefully this doesn't change often. There is a companion patch for Clang. llvm-svn: 221767	2014-11-12 04:19:40 +00:00
Rafael Espindola	10f65de3be	Pass an ArrayRef to MCDisassembler::getInstruction. With this patch MCDisassembler::getInstruction takes an ArrayRef<uint8_t> instead of a MemoryObject. Even on X86 there is a maximum size an instruction can have. Given that, it seems way simpler and more efficient to just pass an ArrayRef to the disassembler instead of a MemoryObject and have it do a virtual call every time it wants some extra bytes. llvm-svn: 221751	2014-11-12 02:04:27 +00:00
Bill Schmidt	2b1221e06b	[PowerPC] Replace foul hackery with real calls to __tls_get_addr My original support for the general dynamic and local dynamic TLS models contained some fairly obtuse hacks to generate calls to __tls_get_addr when lowering a TargetGlobalAddress. Rather than generating real calls, special GET_TLS_ADDR nodes were used to wrap the calls and only reveal them at assembly time. I attempted to provide correct parameter and return values by chaining CopyToReg and CopyFromReg nodes onto the GET_TLS_ADDR nodes, but this was also not fully correct. Problems were seen with two back-to-back stores to TLS variables, where the call sequences ended up overlapping with unhappy results. Additionally, since these weren't real calls, the proper register side effects of a call were not recorded, so clobbered values were kept live across the calls. The proper thing to do is to lower these into calls in the first place. This is relatively straightforward; see the changes to PPCTargetLowering::LowerGlobalTLSAddress() in PPCISelLowering.cpp. The changes here are standard call lowering, except that we need to track the fact that these calls will require a relocation. This is done by adding a machine operand flag of MO_TLSLD or MO_TLSGD to the TargetGlobalAddress operand that appears earlier in the sequence. The calls to LowerCallTo() eventually find their way to LowerCall_64SVR4() or LowerCall_32SVR4(), which call FinishCall(), which calls PrepareCall(). In PrepareCall(), we detect the calls to __tls_get_addr and immediately snag the TargetGlobalTLSAddress with the annotated relocation information. This becomes an extra operand on the call following the callee, which is expected for nodes of type tlscall. We change the call opcode to CALL_TLS for this case. Back in FinishCall(), we change it again to CALL_NOP_TLS for 64-bit only, since we require a TOC-restore nop following the call for the 64-bit ABIs. During selection, patterns in PPCInstrInfo.td and PPCInstr64Bit.td convert the CALL_TLS nodes into BL_TLS nodes, and convert the CALL_NOP_TLS nodes into BL8_NOP_TLS nodes. This replaces the code removed from PPCAsmPrinter.cpp, as the BL_TLS or BL8_NOP_TLS nodes can now be emitted normally using their patterns and the associated printTLSCall print method. Finally, as a result of these changes, all references to get-tls-addr in its various guises are no longer used, so they have been removed. There are existing TLS tests to verify the changes haven't messed anything up). I've added one new test that verifies that the problem with the original code has been fixed. llvm-svn: 221703	2014-11-11 20:44:09 +00:00
Rafael Espindola	476a83ecd4	MCAsmParserExtension has a copy of the MCAsmParser. Use it. Base classes were storing a second copy. llvm-svn: 221667	2014-11-11 05:18:41 +00:00
Rafael Espindola	8cb53479d4	Misc style fixes. NFC. This fixes a few cases of: * Wrong variable name style. * Lines longer than 80 columns. * Repeated names in comments. * clang-format of the above. This make the next patch a lot easier to read. llvm-svn: 221615	2014-11-10 18:11:10 +00:00
Rafael Espindola	20114e4c02	Remove redundant calls to isMaterializable. This removes calls to isMaterializable in the following cases: * It was redundant with a call to isDeclaration now that isDeclaration returns the correct answer for materializable functions. * It was followed by a call to Materialize. Just call Materialize and check EC. llvm-svn: 221050	2014-11-01 16:46:18 +00:00
Bill Schmidt	5c5103e17e	[PowerPC] Initial VSX intrinsic support, with min/max for vector double Now that we have initial support for VSX, we can begin adding intrinsics for programmer access to VSX instructions. This patch adds basic support for VSX intrinsics in general, and tests it by implementing intrinsics for minimum and maximum for the vector double data type. The LLVM portion of this is quite straightforward. There is a companion patch for Clang. llvm-svn: 220988	2014-10-31 19:19:07 +00:00
Ulrich Weigand	5b86d1f937	[PowerPC] Load BlockAddress values from the TOC in 64-bit SVR4 code Since block address values can be larger than 2GB in 64-bit code, they cannot be loaded simply using an @l / @ha pair, but instead must be loaded from the TOC, just like GlobalAddress, ConstantPool, and JumpTable values are. The commit also fixes a bug in PPCLinuxAsmPrinter::doFinalization where temporary labels could not be used as TOC values, since code would attempt (and fail) to use GetOrCreateSymbol to create a symbol of the same name as the temporary label. llvm-svn: 220959	2014-10-31 10:33:14 +00:00
Sanjay Patel	d9b7837012	Use rsqrt (X86) to speed up reciprocal square root calcs This is a first step for generating SSE rsqrt instructions for reciprocal square root calcs when fast-math is allowed. For now, be conservative and only enable this for AMD btver2 where performance improves significantly - for example, 29% on llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c (if we convert the data type to single-precision float). This patch adds a two constant version of the Newton-Raphson refinement algorithm to DAGCombiner that can be selected by any target via a parameter returned by getRsqrtEstimate().. See PR20900 for more details: http://llvm.org/bugs/show_bug.cgi?id=20900 Differential Revision: http://reviews.llvm.org/D5658 llvm-svn: 220570	2014-10-24 17:02:16 +00:00
Bill Schmidt	1033a05374	[PATCH] Support select-cc for VSFRC when VSX is enabled A previous patch enabled SELECT_VSRC and SELECT_CC_VSRC for VSX to handle <2 x double> cases. This patch adds SELECT_VSFRC and SELECT_CC_VSFRC to allow use of all 64 vector-scalar registers for the f64 type when VSX is enabled. The changes are analogous to those in the previous patch. I've added a new variant to vsx.ll to test the code generation. (I also cleaned up a little formatting in PPCInstrVSX.td from the previous patch.) llvm-svn: 220395	2014-10-22 16:58:20 +00:00
Bill Schmidt	884633f291	[PowerPC] Support select-cc for VSX The tests test/CodeGen/Generic/select-cc.ll and test/CodeGen/PowerPC/select-cc.ll both fail with VSX enabled. The problem is that the lowering logic for the SELECT and SELECT_CC operations doesn't currently support the VSX registers. This patch fixes that. In lib/Target/PowerPC/PPCInstrInfo.td, we have pseudos to handle this for other register classes. Similar pseudos are added in PPCInstrVSX.td (they must be there, because the "vsrc" register class definition appears there) for the VSRC register class. The SELECT_VSRC pseudo is then used in pattern matching for SELECT_CC. The rest of the patch just adds logic for SELECT_VSRC wherever similar logic appears for SELECT_VRRC. There are no new test cases because the existing tests above test this, along with a variant in test/CodeGen/PowerPC/vsx.ll. After discussion with Hal, a future patch will add similar _VSFRC variants to override f64 type handling (currently using F8RC). llvm-svn: 220385	2014-10-22 13:13:40 +00:00
Bill Schmidt	b654c2bb89	[PowerPC] Avoid VSX FMA mutate when killed product reg = addend reg With VSX enabled, test/CodeGen/PowerPC/recipest.ll exposes a bug in the FMA mutation pass. If we have a situation where a killed product register is the same register as the FMA target, such as: %vreg5<def,tied1> = XSNMSUBADP %vreg5<tied0>, %vreg11, %vreg5, %RM<imp-use>; VSFRC:%vreg5 F8RC:%vreg11 then the substitution makes no sense. We end up getting a crash when we try to extend the interval associated with the killed product register, as there is already a live range for %vreg5 there. This patch just disables the mutation under those circumstances. Since recipest.ll generates different code with VMX enabled, I've modified that test to use -mattr=-vsx. I've borrowed the code from that test that exposed the bug and placed it in fma-mutate.ll, where it tests several mutation opportunities including the "bad" one. llvm-svn: 220290	2014-10-21 13:02:37 +00:00
Bill Schmidt	4290f0e275	[PowerPC] Change assert to better form llvm-svn: 220092	2014-10-17 21:19:59 +00:00
Bill Schmidt	3e4d52c10c	[PowerPC] Change liveness testing in VSX FMA mutation pass With VSX enabled, LLVM crashes when compiling test/CodeGen/PowerPC/fma.ll. I traced this to the liveness test that's revised in this patch. The interval test is designed to only work for virtual registers, but in this case the AddendSrcReg is physical. Since there is already a walk of the MIs between the AddendMI and the FMA, I added a check for def/kill of the AddendSrcReg in that loop. At Hal Finkel's request, I converted the liveness test to an assert restricted to virtual registers. I've changed the fma.ll test to have VSX and non-VSX variants so we can test both kinds of multiply-adds. llvm-svn: 220090	2014-10-17 21:02:44 +00:00
Bill Schmidt	d3f8b7e4eb	[PowerPC] Enable use of lxvw4x/stxvw4x in VSX code generation Currently the VSX support enables use of lxvd2x and stxvd2x for 2x64 types, but does not yet use lxvw4x and stxvw4x for 4x32 types. This patch adds that support. As with lxvd2x/stxvd2x, this involves straightforward overriding of the patterns normally recognized for lvx/stvx, with preference given to the VSX patterns when VSX is enabled. In addition, the logic for permitting misaligned memory accesses is modified so that v4r32 and v4i32 are treated the same as v2f64 and v2i64 when VSX is enabled. Finally, the DAG generation for unaligned loads is changed to just use a normal LOAD (which will become lxvw4x) on P8 and later hardware, where unaligned loads are preferred over lvsl/lvx/lvx/vperm. A number of tests now generate the VSX loads/stores instead of lvx/stvx, so this patch adds VSX variants to those tests. I've also added <4 x float> tests to the vsx.ll test case, and created a vsx-p8.ll test case to be used for testing code generation for the P8Vector feature. For now, that simply tests the unaligned load/store behavior. This has been tested along with a temporary patch to enable the VSX and P8Vector features, with no new regressions encountered with or without the temporary patch applied. llvm-svn: 220047	2014-10-17 15:13:38 +00:00
Rafael Espindola	1dba93c519	Simplify handling of --noexecstack by using getNonexecutableStackSection. llvm-svn: 219799	2014-10-15 16:12:52 +00:00
Eric Christopher	c65e60f7c8	Use the triple to figure out if this is a darwin target, not the subtarget. llvm-svn: 219673	2014-10-14 08:25:26 +00:00
Benjamin Kramer	8a11a14550	MC: Bit pack MCSymbolData. On x86_64 this brings it from 80 bytes to 64 bytes. Also make any member variables private and clean up uses to go through the existing accessors. NFC. llvm-svn: 219573	2014-10-11 15:07:21 +00:00
Bill Schmidt	87ba7a67bb	[PowerPC] Reduce names from Power8Vector to P8Vector Per Hal Finkel's review, improving typability of some variable names. llvm-svn: 219514	2014-10-10 17:21:15 +00:00
Bill Schmidt	581893751d	[PowerPC] Add feature for Power8 vector extensions The current VSX feature for PowerPC specifies availability of the VSX instructions added with the 2.06 architecture version. With 2.07, the architecture adds new instructions to both the Category:Vector and Category:VSX instruction sets. Additionally, unaligned vector storage operations have improved performance. This patch adds a feature to provide access to the new instructions and performance capabilities of Power8. For compatibility with GCC, the feature is controlled via a new -mpower8-vector switch, and the feature causes the __POWER8_VECTOR__ builtin define to be generated by the preprocessor. There is a companion patch for cfe being committed at the same time. llvm-svn: 219501	2014-10-10 15:09:28 +00:00
Samuel Antao	83b3411742	Fix bug in GPR to FPR moves in PPC64LE. The current implementation of GPR->FPR register moves uses a stack slot. This mechanism writes a double word and reads a word. In big-endian the load address must be displaced by 4-bytes in order to get the right value. In little endian this is no longer required. This patch fixes the issue and adds LE regression tests to fast-isel-conversion which currently expose this problem. llvm-svn: 219441	2014-10-09 20:42:56 +00:00
Bill Schmidt	ddf2d00a6b	[PPC64] VSX indexed-form loads use wrong instruction format The VSX instruction definitions for lxsdx, lxvd2x, lxvdsx, and lxvw4x incorrectly use the XForm_1 instruction format, rather than the XX1Form instruction format. This is likely a pasto when creating these instructions, which were based on lvx and so forth. This patch uses the correct format. The existing reformatting test (test/MC/PowerPC/vsx.s) missed this because the two formats differ only in that XX1Form has an extension to the target register field in bit 31. The tests for these instructions used a target register of 7, so the default of 0 in bit 31 for XForm_1 didn't expose a problem. For register numbers 32-63 this would be noticeable. I've changed the test to use higher register numbers to verify my change is effective. llvm-svn: 219416	2014-10-09 17:51:35 +00:00
Eric Christopher	faca264c55	Add subtarget caches to aarch64, arm, ppc, and x86. These will make it easier to test further changes to the code generation and optimization pipelines as those are moved to subtargets initialized with target feature and target cpu. llvm-svn: 219106	2014-10-06 06:45:36 +00:00
Robin Morisset	95772cea0c	[Power] Use lwsync for non-seq_cst fences Summary: hwsync is only required for seq_cst fences, acquire and release one can use the cheaper lwsync. Test Plan: Added some cases to atomics.ll + make check-all Reviewers: jfb, wschmidt Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5317 llvm-svn: 218995	2014-10-03 18:04:36 +00:00
Hal Finkel	2093a3cb26	[PowerPC] Modern Book-E cores support sync Older Book-E cores, such as the PPC 440, support only msync (which has the same encoding as sync 0), but not any of the other sync forms. Newer Book-E cores, however, do support sync, and for performance reasons we should allow the use of the more-general form. This refactors msync use into its own feature group so that it applies by default only to older Book-E cores (of the relevant cores, we only have definitions for the PPC440/450 currently). llvm-svn: 218923	2014-10-02 22:34:22 +00:00
Robin Morisset	8895df3e75	[Power] Improve the expansion of atomic loads/stores Summary: Atomic loads and store of up to the native size (32 bits, or 64 for PPC64) can be lowered to a simple load or store instruction (as the synchronization is already handled by AtomicExpand, and the atomicity is guaranteed thanks to the alignment requirements of atomic accesses). This is exactly what this patch does. Previously, these were implemented by complex load-linked/store-conditional loops.. an obvious performance problem. For example, this patch turns ``` define void @store_i8_unordered(i8* %mem) { store atomic i8 42, i8* %mem unordered, align 1 ret void } ``` from ``` _store_i8_unordered: ; @store_i8_unordered ; BB#0: rlwinm r2, r3, 3, 27, 28 li r4, 42 xori r5, r2, 24 rlwinm r2, r3, 0, 0, 29 li r3, 255 slw r4, r4, r5 slw r3, r3, r5 and r4, r4, r3 LBB4_1: ; =>This Inner Loop Header: Depth=1 lwarx r5, 0, r2 andc r5, r5, r3 or r5, r4, r5 stwcx. r5, 0, r2 bne cr0, LBB4_1 ; BB#2: blr ``` into ``` _store_i8_unordered: ; @store_i8_unordered ; BB#0: li r2, 42 stb r2, 0(r3) blr ``` which looks like a pretty clear win to me. Test Plan: fixed the tests + new test for indexed accesses + make check-all Reviewers: jfb, wschmidt, hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5587 llvm-svn: 218922	2014-10-02 22:27:07 +00:00
Eric Christopher	3eb7c19a39	constify the TargetMachine argument used in the subtarget and lowering constructors. llvm-svn: 218832	2014-10-01 21:36:28 +00:00
Eric Christopher	bce38d60f8	Now that the optimization level is adjusting the feature string before we hit the subtarget, remove the constructor parameter. llvm-svn: 218817	2014-10-01 21:05:35 +00:00
Eric Christopher	4ce55b7a5c	Rework the PPC TargetMachine so that the non-function specific overrides happen at TargetMachine creation and not on every subtarget creation. llvm-svn: 218805	2014-10-01 20:38:26 +00:00
Sanjay Patel	c095454ca2	Split the estimate() interface into separate functions for each type. NFC. It was hacky to use an opcode as a switch because it won't always match (rsqrte != sqrte), and it looks like we'll need to add more special casing per arch than I had hoped for. Eg, x86 will prefer a different NR estimate implementation. ARM will want to use it's 'step' instructions. There also don't appear to be any new estimate instructions in any arch in a long, long time. Altivec vloge and vexpte may have been the first and last in that field... llvm-svn: 218698	2014-09-30 20:28:48 +00:00
Sanjay Patel	98a98574c5	Refactor reciprocal and reciprocal square root estimate into target-independent functions (part 2). This is purely refactoring. No functional changes intended. PowerPC is the only target that is currently using this interface. The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this: z = y / sqrt(x) into: z = y * rsqrte(x) And: z = y / x into: z = y * rcpe(x) using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 . There is one hook in TargetLowering to get the target-specific opcode for an estimate instruction along with the number of refinement steps needed to make the estimate usable. Differential Revision: http://reviews.llvm.org/D5484 llvm-svn: 218553	2014-09-26 23:01:47 +00:00
Robin Morisset	64053dff5f	[Power] Use AtomicExpandPass for fence insertion, and use lwsync where appropriate Summary: This patch makes use of AtomicExpandPass in Power for inserting fences around atomic as part of an effort to remove fence insertion from SelectionDAGBuilder. As a big bonus, it lets us use sync 1 (lightweight sync, often used by the mnemonic lwsync) instead of sync 0 (heavyweight sync) in many cases. I also added a test, as there was no test for the barriers emitted by the Power backend for atomic loads and stores. Test Plan: new test + make check-all Reviewers: jfb Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5180 llvm-svn: 218331	2014-09-23 20:46:49 +00:00
Lang Hames	719ff0ff4f	[MCJIT] Remove PPCRelocations.h - it's no longer used. This was overlooked in r218320, which removed the relocation headers for other targets. Thanks to Ulrich Weigand for catching it. llvm-svn: 218327	2014-09-23 19:17:48 +00:00
Sanjay Patel	1235f854b3	Refactor reciprocal square root estimate into target-independent function; NFC. This is purely a plumbing patch. No functional changes intended. The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this: z = y / sqrt(x) into: z = y * rsqrte(x) using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 . The first step is to add a target hook for RSQRTE, take the already target-independent code selfishly hoarded by PPC, and put it into DAGCombiner. Next steps: The code in DAGCombiner::BuildRSQRTE() should be refactored further; tests that exercise that logic need to be added. Logic in PPCTargetLowering::BuildRSQRTE() should be hoisted into DAGCombiner. X86 and AArch64 overrides for TargetLowering.BuildRSQRTE() should be added. Differential Revision: http://reviews.llvm.org/D5425 llvm-svn: 218219	2014-09-21 15:19:15 +00:00
Hal Finkel	0c0c256ad7	Optionally enable more-aggressive FMA formation in DAGCombine The heuristic used by DAGCombine to form FMAs checks that the FMUL has only one use, but this is overly-conservative on some systems. Specifically, if the FMA and the FADD have the same latency (and the FMA does not compete for resources with the FMUL any more than the FADD does), there is no need for the restriction, and furthermore, forming the FMA leaving the FMUL can still allow for higher overall throughput and decreased critical-path length. Here we add a new TLI callback, enableAggressiveFMAFusion, false by default, to elide the hasOneUse check. This is enabled for PowerPC by default, as most PowerPC systems will benefit. Patch by Olivier Sallenave, thanks! llvm-svn: 218120	2014-09-19 11:42:56 +00:00
Aaron Ballman	c9d2119dc2	Reverting NFC changes from r218050. Instead, the warning was disabled for GCC in r218059, so these changes are no longer required. llvm-svn: 218062	2014-09-18 17:34:23 +00:00
Aaron Ballman	2e4b3f3dca	Fixing a bunch of -Woverloaded-virtual warnings due to hiding getSubtargetImpl from the base class. NFC. llvm-svn: 218050	2014-09-18 13:27:14 +00:00
Eric Christopher	c75fbbac7c	Add a new pass FunctionTargetTransformInfo. This pass serves as a shim between the TargetTransformInfo immutable pass and the Subtarget via the TargetMachine and Function. Migrate a single call from BasicTargetTransformInfo as an example and provide shims where TargetMachine begins taking a Function to determine the subtarget. No functional change. llvm-svn: 218004	2014-09-18 00:34:14 +00:00
Samuel Antao	ec112df870	Fix FastISel bug in boolean returns for PowerPC. For PPC targets, FastISel does not take the sign extension information into account when selecting return instructions whose operands are constants. A consequence of this is that the return of boolean values is not correct. This patch fixes the problem by evaluating the sign extension information also for constants, forwarding this information to PPCMaterializeInt which takes this information to drive the sign extension during the materialization. llvm-svn: 217993	2014-09-17 23:25:06 +00:00
Samuel Antao	7871b3496a	Remove unnecessary blank space (test commit) llvm-svn: 217991	2014-09-17 22:47:28 +00:00
Bill Schmidt	b47e7e1e1d	Address comments on r217622 llvm-svn: 217680	2014-09-12 14:26:36 +00:00
Bill Schmidt	020f302f01	[PATCH, PowerPC] Accept 'U' and 'X' constraints in inline asm Inline asm may specify 'U' and 'X' constraints to print a 'u' for an update-form memory reference, or an 'x' for an indexed-form memory reference. However, these are really only useful in GCC internal code generation. In inline asm the operand of the memory constraint is typically just a register containing the address, so 'U' and 'X' make no sense. This patch quietly accepts 'U' and 'X' in inline asm patterns, but otherwise does nothing. If we ever unexpectedly see a non-register, we'll assert and sort it out afterwards. I've added a new test for these constraints; the test case should be used for other asm-constraints changes down the road. llvm-svn: 217622	2014-09-11 20:10:03 +00:00
Sanjay Patel	8030ed3639	Rename getMaximumUnrollFactor -> getMaxInterleaveFactor; also rename option names controlling this variable. "Unroll" is not the appropriate name for this variable. Clang already uses the term "interleave" in pragmas and metadata for this. Differential Revision: http://reviews.llvm.org/D5066 llvm-svn: 217528	2014-09-10 17:58:16 +00:00
Craig Topper	743e490f28	Use cast to MVT instead of EVT on a couple calls to getSizeInBits. llvm-svn: 217473	2014-09-10 04:51:36 +00:00
Juergen Ributzka	76dd2e3da7	[FastISel][tblgen] Rename tblgen generated FastISel functions. NFC. This is the final round of renaming. This changes tblgen to emit lower-case function names for FastEmitInst_* and FastEmit_*, and updates all its uses in the source code. Reviewed by Eric llvm-svn: 217075	2014-09-03 20:56:59 +00:00
Juergen Ributzka	fa7bc008ce	[FastISel] Rename public visible FastISel functions. NFC. This commit renames the following public FastISel functions: LowerArguments -> lowerArguments SelectInstruction -> selectInstruction TargetSelectInstruction -> fastSelectInstruction FastLowerArguments -> fastLowerArguments FastLowerCall -> fastLowerCall FastLowerIntrinsicCall -> fastLowerIntrinsicCall FastEmitZExtFromI1 -> fastEmitZExtFromI1 FastEmitBranch -> fastEmitBranch UpdateValueMap -> updateValueMap TargetMaterializeConstant -> fastMaterializeConstant TargetMaterializeAlloca -> fastMaterializeAlloca TargetMaterializeFloatZero -> fastMaterializeFloatZero LowerCallTo -> lowerCallTo Reviewed by Eric llvm-svn: 217074	2014-09-03 20:56:52 +00:00
Eric Christopher	e1f21228eb	Remove resetSubtargetFeatures as it is unused. llvm-svn: 217071	2014-09-03 20:36:31 +00:00
Benjamin Kramer	e991977346	Add override to overriden virtual methods, remove virtual keywords. No functionality change. Changes made by clang-tidy + some manual cleanup. llvm-svn: 217028	2014-09-03 11:41:21 +00:00
Eric Christopher	2f6f860aaa	Reinstate "Nuke the old JIT." Approved by Jim Grosbach, Lang Hames, Rafael Espindola. This reinstates commits r215111, 215115, 215116, 215117, 215136. llvm-svn: 216982	2014-09-02 22:28:02 +00:00
Alexey Samsonov	2724968a53	Fix signed integer overflow in PPCInstPrinter. This bug was reported by UBSan. llvm-svn: 216917	2014-09-02 17:38:34 +00:00
Hal Finkel	46b8c3f54e	[PowerPC] Guard against illegal selection of add for TargetConstant operands r208640 was reverted because it caused a self-hosting failure on ppc64. The underlying cause was the formation of ISD::ADD nodes with ISD::TargetConstant operands. Because we have no patterns for 'add' taking 'timm' nodes, these are selected as r+r add instructions (which is a miscompile). Guard against this kind of behavior in the future by making the backend crash should this occur (instead of silently generating invalid output). llvm-svn: 216897	2014-09-02 06:23:54 +00:00
Craig Topper	57c93cf3ef	Remove 'virtual' keyword from methods markedwith 'override' keyword. llvm-svn: 216823	2014-08-30 16:48:34 +00:00
Justin Hibbits	fd1faff02b	Test commit. Fix whitespace from a previous patch of mine. llvm-svn: 216650	2014-08-28 04:40:55 +00:00
Karthik Bhat	d94045aa5a	Allow vectorization of division by uniform power of 2. This patch adds support to recognize division by uniform power of 2 and modifies the cost table to vectorize division by uniform power of 2 whenever possible. Updates Cost model for Loop and SLP Vectorizer.The cost table is currently only updated for X86 backend. Thanks to Hal, Andrea, Sanjay for the review. (http://reviews.llvm.org/D4971) llvm-svn: 216371	2014-08-25 04:56:54 +00:00
Hal Finkel	7014e3d50f	[PowerPC] Add support for dcbtst and icbt (prefetch) Adds code generation support for dcbtst (data cache prefetch for write) and icbt (instruction cache prefetch for read - Book E cores only). We still end up with a 'cannot select' error for the non-supported prefetch intrinsic forms. This will be fixed in a later commit. Fixes PR20692. llvm-svn: 216339	2014-08-23 23:21:04 +00:00
Sanjay Patel	7498546daf	name change: isPow2DivCheap -> isPow2SDivCheap isPow2DivCheap That name doesn't specify signed or unsigned. Lazy as I am, I eventually read the function and variable comments. It turns out that this is strictly about signed div. But I discovered that the comments are wrong: srl/add/sra is not the general sequence for signed integer division by power-of-2. We need one more 'sra': sra/srl/add/sra That's the sequence produced in DAGCombiner. The first 'sra' may be removed when dividing by exactly '2', but that's a special case. This patch corrects the comments, changes the name of the flag bit, and changes the name of the accessor methods. No functional change intended. Differential Revision: http://reviews.llvm.org/D5010 llvm-svn: 216237	2014-08-21 22:31:48 +00:00
Tim Northover	9127b613b1	TableGen: allow use of uint64_t for available features mask. ARM in particular is getting dangerously close to exceeding 32 bits worth of possible subtarget features. When this happens, various parts of MC start to fail inexplicably as masks get truncated to "unsigned". Mostly just refactoring at present, and there's probably no way to test. llvm-svn: 215887	2014-08-18 11:49:42 +00:00
Hal Finkel	5f7466abdb	[PowerPC] Mark fixed-offset byvals as pointed-to by IR values A byval object, even if allocated at a fixed offset (prescribed by the ABI) is pointed to by IR values. Most fixed-offset stack objects are not pointed-to by IR values, so the default is to assume this is not possible. However, we need to override the default in this case (instruction scheduling can cause miscompiles otherwise). Fixes PR20280. llvm-svn: 215795	2014-08-16 00:17:05 +00:00
Hal Finkel	519c3f0279	[PowerPC] Darwin byval arguments are not immutable On PPC/Darwin, byval arguments occur at fixed stack offsets in the callee's frame, but are not immutable -- the pointer value is directly available to the higher-level code as the address of the argument, and the value of the byval argument can be modified at the IR level. This is necessary, but not sufficient, to fix PR20280. When PR20280 is fixed in a follow-up commit, its test case will cover this change. llvm-svn: 215793	2014-08-16 00:16:29 +00:00
Rafael Espindola	80b0c0c71c	Remove HasLEB128. We already require CFI, so it should be safe to require .leb128 and .uleb128. llvm-svn: 215712	2014-08-15 14:01:07 +00:00
Benjamin Kramer	9d3803050f	PPC: Clean up pointer casting, no functionality change. Silences GCC's -Wcast-qual. llvm-svn: 215703	2014-08-15 11:05:45 +00:00
Bill Schmidt	c5f23638d5	[PPC64] Add missing dependency on X2 to LDinto_toc. The LDinto_toc pattern has been part of 64-bit PowerPC for a long time, and represents loading from a memory location into the TOC register (X2). However, this pattern doesn't explicitly record that it modifies that register. This patch adds the missing dependency. It was very surprising to me that this has never shown up as a problem in the past, and that we only saw this problem recently in a single scenario when building a self-hosted clang. It turns out that in most cases we have another dependency present that keeps the LDinto_toc instruction tied in place. LDinto_toc is used for TOC restore following a call site, so this is a typical sequence: BCTRL8 <regmask>, %CTR8<imp-use>, %RM<imp-use>, %X3<imp-use>, %X12<imp-use>, %X1<imp-def>, ... LDinto_toc 24, %X1 ADJCALLSTACKUP 96, 0, %R1<imp-def>, %R1<imp-use> Because the LDinto_toc is inserted prior to the ADJCALLSTACKUP, there is a natural anti-dependency between the two that keeps it in place. Therefore we don't usually see a problem. However, in one particular case, one call is followed immediately by another call, and the second call requires a parameter that is a TOC-relative address. This is the code sequence: BCTRL8 <regmask>, %CTR8<imp-use>, %RM<imp-use>, %X3<imp-use>, %X4<imp-use>, %X5<imp-use>, %X12<imp-use>, %X1<imp-def>, ... LDinto_toc 24, %X1 ADJCALLSTACKUP 96, 0, %R1<imp-def>, %R1<imp-use> ADJCALLSTACKDOWN 96, %R1<imp-def>, %R1<imp-use> %vreg39<def> = ADDIStocHA %X2, <ga:@.str>; G8RC_and_G8RC_NOX0:%vreg39 %vreg40<def> = ADDItocL %vreg39<kill>, <ga:@.str>; G8RC:%vreg40 G8RC_and_G8RC_NOX0:%vreg39 Note that the back-to-back stack adjustments are the same size! The back end is smart enough to recognize this and optimize them away: BCTRL8 <regmask>, %CTR8<imp-use>, %RM<imp-use>, %X3<imp-use>, %X4<imp-use>, %X5<imp-use>, %X12<imp-use>, %X1<imp-def>, ... LDinto_toc 24, %X1 %vreg39<def> = ADDIStocHA %X2, <ga:@.str>; G8RC_and_G8RC_NOX0:%vreg39 %vreg40<def> = ADDItocL %vreg39<kill>, <ga:@.str>; G8RC:%vreg40 G8RC_and_G8RC_NOX0:%vreg39 Now there is nothing to prevent the ADDIStocHA instruction from moving ahead of the LDinto_toc instruction, and because of the longest-path heuristic, this is what happens. With the accompanying patch, %X2 is represented as an implicit def: BCTRL8 <regmask>, %CTR8<imp-use>, %RM<imp-use>, %X3<imp-use>, %X4<imp-use>, %X5<imp-use>, %X12<imp-use>, %X1<imp-def>, ... LDinto_toc 24, %X1, %X2<imp-def,dead> ADJCALLSTACKUP 96, 0, %R1<imp-def,dead>, %R1<imp-use> ADJCALLSTACKDOWN 96, %R1<imp-def,dead>, %R1<imp-use> %vreg39<def> = ADDIStocHA %X2, <ga:@.str>; G8RC_and_G8RC_NOX0:%vreg39 %vreg40<def> = ADDItocL %vreg39<kill>, <ga:@.str>; G8RC:%vreg40 G8RC_and_G8RC_NOX0:%vreg39 So now when the two stack adjustments are removed, ADDIStocHA is prevented from being moved above LDinto_toc. I have not yet created a test case for this, because the original failure occurs on a relatively large function that needs reduction. However, this is a fairly serious bug, despite its infrequency, and I wanted to get this patch onto the list as soon as possible so that it can be considered for a 3.5 backport. I'll work on whittling down a test case. Have we missed the boat for 3.5 at this point? Thanks, Bill llvm-svn: 215685	2014-08-15 01:25:26 +00:00
Benjamin Kramer	da144ed5a2	Canonicalize header guards into a common format. Add header guards to files that were missing guards. Remove #endif comments as they don't seem common in LLVM (we can easily add them back if we decide they're useful) Changes made by clang-tidy with minor tweaks. llvm-svn: 215558	2014-08-13 16:26:38 +00:00
Hal Finkel	97fb1d4d91	[PowerPC] Implement PPCTargetLowering::getTgtMemIntrinsic This implements PPCTargetLowering::getTgtMemIntrinsic for Altivec load/store intrinsics. As with the construction of the MachineMemOperands for the intrinsic calls used for unaligned load/store lowering, the only slight complication is that we need to represent a larger memory range than the loaded/stored value-type size (because the address is rounded down to an aligned address, and we need to conservatively represent the entire possible range of the actual access). This required adding an extra size field to TargetLowering::IntrinsicInfo, and this was done in a way that required no modifications to other targets (the size defaults to the store size of the provided memory data type). This fixes test/CodeGen/PowerPC/unal-altivec-wint.ll (so it can be un-XFAILed). llvm-svn: 215512	2014-08-13 01:15:40 +00:00
Joerg Sonnenberger	4a2da5b0e2	@l and friends adjust their value depending the context used in. For ori, they are unsigned, for addi, signed. Create a new target expression type to handle this and evaluate Fixups accordingly. llvm-svn: 215315	2014-08-10 12:41:50 +00:00
Joerg Sonnenberger	df2012d0aa	If available, pass down the Fixup object to EvaluateAsRelocatable. At least on PowerPC, the interpretation of certain modifiers depends on the context they appear in. llvm-svn: 215310	2014-08-10 11:35:12 +00:00
Joerg Sonnenberger	f744f70919	Allow the third argument for the subi family to be an expression. llvm-svn: 215286	2014-08-09 17:10:26 +00:00
Joerg Sonnenberger	a632bc8df4	Use the full form of dccci and iccci from the early PPC 405 documents, since the operands are actually used on those cores. Provide aliases for the only documented case in the newer Power ISA speec. llvm-svn: 215282	2014-08-09 13:58:31 +00:00
Eric Christopher	3461eab467	Initialize PPC DataLayout based on the Triple only. llvm-svn: 215281	2014-08-09 04:53:17 +00:00
Eric Christopher	846e6d954c	Remove extraneous 64-bit argument to the PPC TargetMachine constructor and update initialization. llvm-svn: 215280	2014-08-09 04:38:56 +00:00
Joerg Sonnenberger	71ad99879f	Allow large immediates for branch instructions in 32bit mode. llvm-svn: 215240	2014-08-08 20:57:58 +00:00
Joerg Sonnenberger	8e42fa93a6	Provide an implementation of getNoopForMachoTarget for PPC, otherwise empty functions will assert in the MC object writer. llvm-svn: 215238	2014-08-08 19:13:23 +00:00
Joerg Sonnenberger	4df95fac67	Add low-level option for avoiding float stores from va_start until soft-float is properly supported. llvm-svn: 215221	2014-08-08 16:46:10 +00:00
Joerg Sonnenberger	ca56042175	Add support for SPE load/store from memory. llvm-svn: 215220	2014-08-08 16:43:49 +00:00
Eric Christopher	378bc328f0	Temporarily Revert "Nuke the old JIT." as it's not quite ready to be deleted. This will be reapplied as soon as possible and before the 3.6 branch date at any rate. Approved by Jim Grosbach, Lang Hames, Rafael Espindola. This reverts commits r215111, 215115, 215116, 215117, 215136. llvm-svn: 215154	2014-08-07 22:02:54 +00:00
Joerg Sonnenberger	857466a57b	Add the majority of the remaining SPE instructions. llvm-svn: 215131	2014-08-07 18:52:39 +00:00
Rafael Espindola	e9ebbe5559	Nuke the old JIT. I am sure we will be finding bits and pieces of dead code for years to come, but this is a good start. Thanks to Lang Hames for making MCJIT a good replacement! llvm-svn: 215111	2014-08-07 14:21:18 +00:00
Joerg Sonnenberger	c6357b6edc	Add mfasr and mtasr llvm-svn: 215110	2014-08-07 13:35:34 +00:00
Joerg Sonnenberger	57c4e076f0	Add mfrtcu and mfrtcl instructions llvm-svn: 215109	2014-08-07 13:16:58 +00:00
Joerg Sonnenberger	91c0c415d7	Support mttbl and mttbu mnemonic llvm-svn: 215108	2014-08-07 13:06:23 +00:00
Joerg Sonnenberger	22c67059fa	Add RFID instruction. llvm-svn: 215105	2014-08-07 12:39:59 +00:00
Joerg Sonnenberger	2b751a0bc6	Fix Itineray class of rfi llvm-svn: 215104	2014-08-07 12:35:16 +00:00
Joerg Sonnenberger	9980cac0b2	Spell e500 feature in lower case. llvm-svn: 215103	2014-08-07 12:31:28 +00:00
Joerg Sonnenberger	d344d15c74	Add first bunch of SPE instructions. As they overlap with Altivec, mark them as parser-only until the disassembler is extended to handle predicates properly. llvm-svn: 215102	2014-08-07 12:18:21 +00:00
Eric Christopher	4a1cdb2ba7	Remove the target machine from CCState. Previously it was only used to get the subtarget and that's accessible from the MachineFunction now. This helps clear the way for smaller changes where we getting a subtarget will require passing in a MachineFunction/Function as well. llvm-svn: 214988	2014-08-06 18:45:26 +00:00
Rafael Espindola	c981388b03	Remove a virtual function from TargetMachine. NFC. llvm-svn: 214929	2014-08-05 22:10:21 +00:00
Bill Schmidt	8159f9047b	[PowerPC] Swap arguments and adjust shift count for vsldoi on little endian Commits r213915 and r214718 fix recognition of shuffle masks for vmrg* and vpku*um instructions for a little-endian target, by swapping the input arguments. The vsldoi instruction requires similar treatment, and also needs its shift count adjusted for little endian. Reviewed by Ulrich Weigand. This is a bug fix candidate for release 3.5 (and hopefully the last of those for PowerPC). llvm-svn: 214923	2014-08-05 20:47:25 +00:00
Joerg Sonnenberger	6eb6423316	Add accessors for the PPC 403 bank registers. llvm-svn: 214875	2014-08-05 15:45:15 +00:00
Joerg Sonnenberger	cc8f67da81	Accessors for SSR2 and SSR3 on PPC 403. llvm-svn: 214867	2014-08-05 14:53:05 +00:00
Joerg Sonnenberger	bc6e9b7c12	Add dci/ici instructions for PPC 476 and friends. llvm-svn: 214864	2014-08-05 14:40:32 +00:00
Joerg Sonnenberger	f726eed55e	Add mftblo and mftbhi for PPC 4xx. llvm-svn: 214863	2014-08-05 14:18:16 +00:00
Joerg Sonnenberger	d97414d887	Add lswi / stswi for assembler use with a warning to not add patterns for them. llvm-svn: 214862	2014-08-05 13:34:01 +00:00
Eric Christopher	67c04e77e5	Have MachineFunction cache a pointer to the subtarget to make lookups shorter/easier and have the DAG use that to do the same lookup. This can be used in the future for TargetMachine based caching lookups from the MachineFunction easily. Update the MIPS subtarget switching machinery to update this pointer at the same time it runs. llvm-svn: 214838	2014-08-05 02:39:49 +00:00
Joerg Sonnenberger	c84fe6e931	Add TCR register access llvm-svn: 214826	2014-08-04 23:53:42 +00:00
Joerg Sonnenberger	cd34d01e29	Add PPC 603's tlbld and tlbli instructions. llvm-svn: 214825	2014-08-04 23:49:45 +00:00
Bill Schmidt	5e2cd3791c	[PPC64LE] Fix wrong IR for vec_sld and vec_vsldoi My original LE implementation of the vsldoi instruction, with its altivec.h interfaces vec_sld and vec_vsldoi, produces incorrect shufflevector operations in the LLVM IR. Correct code is generated because the back end handles the incorrect shufflevector in a consistent manner. This patch and a companion patch for Clang correct this problem by removing the fixup from altivec.h and the corresponding fixup from the PowerPC back end. Several test cases are also modified to reflect the now-correct LLVM IR. llvm-svn: 214800	2014-08-04 23:21:01 +00:00
Joerg Sonnenberger	b94b28812d	Add simplified aliases for access to DCCR, ICCR, DEAR and ESR llvm-svn: 214797	2014-08-04 22:56:42 +00:00
Joerg Sonnenberger	0f4e0ad62e	tlbre / tlbwe / tlbsx / tlbsx. variants for the PPC 4xx CPUs. llvm-svn: 214784	2014-08-04 21:28:22 +00:00
Eric Christopher	99307e99a2	Remove the TargetMachine forwards for TargetSubtargetInfo based information and update all callers. No functional change. llvm-svn: 214781	2014-08-04 21:25:23 +00:00
Joerg Sonnenberger	c3da10cbbc	Recognize mftbl as alias for mftb, for symmetry with mttb. llvm-svn: 214769	2014-08-04 20:28:34 +00:00
Joerg Sonnenberger	3be1146503	Rename PPCLinuxMCAsmInfo to PPCELFMCAsmInfo to better reflect the systems it represents. llvm-svn: 214755	2014-08-04 18:46:13 +00:00
Joerg Sonnenberger	660877af60	Allow .lcomm with alignment on ELF targets. llvm-svn: 214754	2014-08-04 18:45:10 +00:00
Joerg Sonnenberger	51a8467117	Refactor SPRG instructions. llvm-svn: 214733	2014-08-04 17:26:15 +00:00
Joerg Sonnenberger	f6049406d6	Add support for m[ft][di]bat[ul] instructions. llvm-svn: 214731	2014-08-04 17:07:41 +00:00
Joerg Sonnenberger	b8c7e54901	Add features for PPC 4xx and e500/e500mc instructions. Move the test cases for them into separate files. llvm-svn: 214724	2014-08-04 15:47:38 +00:00
Ulrich Weigand	5df23aacbe	[PowerPC] Swap arguments to vpkuhum/vpkuwum on little-endian In commit r213915, Bill fixed little-endian usage of vmrgh* and vmrgl* by swapping the input arguments. As it turns out, the exact same fix is also required for the vpkuhum/vpkuwum patterns. This fixes another regression in llvmpipe when vector support is enabled. Reviewed by Bill Schmidt. llvm-svn: 214718	2014-08-04 13:53:40 +00:00
Ulrich Weigand	bf94969247	[PowerPC] MULHU/MULHS are not legal for vector types I ran into some test failures where common code changed vector division by constant into a multiply-high operation (MULHU). But these are not implemented by the back-end, so we failed to recognize the insn. Fixed by marking MULHU/MULHS as Expand for vector types. llvm-svn: 214716	2014-08-04 13:27:12 +00:00
Ulrich Weigand	32b8ceb243	[PowerPC] Fix and improve vector comparisons This patch refactors code generation of vector comparisons. This fixes a wrong code-gen bug for ISD::SETGE for floating-point types, and improves generated code for vector comparisons in general. Specifically, the patch moves all logic deciding how to implement vector comparisons into getVCmpInst, which gets two extra boolean outputs indicating to its caller whether its needs to swap the input operands and/or negate the result of the comparison. Apart from implementing these two modifications as directed by getVCmpInst, there is no need to ever implement vector comparisons in any other manner; in particular, there is never a need to perform two separate comparisons (e.g. one for equal and one for greater-than, as code used to do before this patch). Reviewed by Bill Schmidt. llvm-svn: 214714	2014-08-04 13:13:57 +00:00
Joerg Sonnenberger	1253396c1a	tlbia support llvm-svn: 214640	2014-08-02 20:16:29 +00:00
Joerg Sonnenberger	e208527e31	mfdcr / mtdcr support llvm-svn: 214639	2014-08-02 20:00:26 +00:00
Joerg Sonnenberger	ca3cb82714	Don't use additional arguments for dss and friends to satisfy DSS_Form, when let can do the same thing. Keep the 64bit variants as codegen-only. While they have a different register class, the encoding is the same for 32bit and 64bit mode. Having both present would otherwise confuse the disassembler. llvm-svn: 214636	2014-08-02 15:09:41 +00:00
Eric Christopher	dfc6457da2	Add a non-const subtarget returning function to the target machine so that we can use it to get the old-style JIT out of the subtarget. This code should be removed when the old-style JIT is removed (imminently). llvm-svn: 214560	2014-08-01 21:18:01 +00:00
Ulrich Weigand	9dae728acd	[PowerPC] PR20280 - Slots for byval parameters are not immutable Found by inspection while looking at PR20280: code would mark slots in the parameter save area where a byval parameter is passed as "immutable". This is not correct since code is allowed to modify byval parameters in place in the parameter save area. llvm-svn: 214517	2014-08-01 14:35:58 +00:00
Hal Finkel	f46410e6eb	[PowerPC] Generate unaligned vector loads using intrinsics instead of regular loads Altivec vector loads on PowerPC have an interesting property: They always load from an aligned address (by rounding down the address actually provided if necessary). In order to generate an actual unaligned load, you can generate two load instructions, one with the original address, one offset by one vector length, and use a special permutation to extract the bytes desired. When this was originally implemented, I generated these two loads using regular ISD::LOAD nodes, now marked as aligned. Unfortunately, there is a problem with this: The alignment of a load does not contribute to its identity, and SDNodes are uniqued. So, imagine that we have some unaligned load, L1, that is not aligned. The routine will create two loads, L1(aligned) and (L1+16)(aligned). Further imagine that there had already existed a load (L1+16)(unaligned) with the same chain operand as the load L1. When (L1+16)(aligned) is created as part of the lowering of L1, this load is also the (L1+16)(unaligned) node, just now marked as aligned (because the new alignment overwrites the old). But the original users of (L1+16)(unaligned) now get the data intended for the permutation yielding the data for L1, and (L1+16)(unaligned) no longer exists to get its own permutation-based expansion. This was PR19991. A second potential problem has to do with the MMOs on these loads, which can be used by AA during instruction scheduling to break chain-based dependencies. If the new "aligned" loads get the MMO from the original unaligned load, this does not represent the fact that it will load data from below the original address. Normally, this would not matter, but this load might be combined with another load pair for a previous vector, and then the dependency on the otherwise- ignored lower bytes can matter. To fix both problems, instead of generating the necessary loads using regular ISD::LOAD instructions, ppc_altivec_lvx intrinsics are used instead. These are provided with MMOs with a conservative address range. Unfortunately, I no longer have a failing test case (since PR19991 was reported, other changes in CodeGen have forced this bug back into hiding it again). Nevertheless, this should fix the underlying problem. llvm-svn: 214481	2014-08-01 05:20:41 +00:00
Hal Finkel	3be61a8b81	[PowerPC] Recognize consecutive memory accesses from intrinsics When generating unaligned vector loads, we need to search for other loads or stores nearby offset by one vector width. If we find one, then we know that we can safely generate another aligned load at that address. Otherwise, we must generate the next load using an offset of the vector width minus one byte (so we don't read off the end of the allocation if the base unaligned address happened to be aligned at runtime). We had previously done this using only other vector loads and stores, but did not consider the PowerPC-specific vector load/store intrinsics. Now we'll also consider vector intrinsics. By itself, this change is a feature enhancement, but is a necessary step toward fixing the underlying problem behind PR19991. llvm-svn: 214469	2014-08-01 01:02:01 +00:00
Louis Gerbarg	8048e52537	Make sure no loads resulting from load->switch DAGCombine are marked invariant Currently when DAGCombine converts loads feeding a switch into a switch of addresses feeding a load the new load inherits the isInvariant flag of the left side. This is incorrect since invariant loads can be reordered in cases where it is illegal to reoarder normal loads. This patch adds an isInvariant parameter to getExtLoad() and updates all call sites to pass in the data if they have it or false if they don't. It also changes the DAGCombine to use that data to make the right decision when creating the new load. llvm-svn: 214449	2014-07-31 21:45:05 +00:00
Joerg Sonnenberger	a5a8f9d13c	Add mtpid/mfpid for BookE. llvm-svn: 214363	2014-07-30 23:59:11 +00:00
Joerg Sonnenberger	3845a9a9b8	Refactor TLBIVAX and add tlbsx. llvm-svn: 214354	2014-07-30 22:51:15 +00:00
Joerg Sonnenberger	acae7c8f64	Add rfdi and rfmci from the e500/e500mc ISA. llvm-svn: 214339	2014-07-30 21:09:03 +00:00
Joerg Sonnenberger	e81540b210	Add BookE's tlbre, tlbwe and tlbivax instructions. llvm-svn: 214332	2014-07-30 20:44:04 +00:00
Joerg Sonnenberger	270c5a66fb	Add BookE's wrtee and wrteei instructions. llvm-svn: 214297	2014-07-30 10:32:51 +00:00
Joerg Sonnenberger	577206124d	SPRG 0 to 3 are valid outside BookE, so move them to the normal test file. Add support for accessing SPRG 4 to 7 on BookE. llvm-svn: 214295	2014-07-30 09:24:37 +00:00
Joerg Sonnenberger	db7b2c7644	Add rfci instruction. llvm-svn: 214256	2014-07-29 23:45:20 +00:00
Joerg Sonnenberger	abd973f309	mbar without argument is equivalent to mbar 0. llvm-svn: 214250	2014-07-29 23:31:27 +00:00
Joerg Sonnenberger	afb8bfcb47	Recognize BookE's mbar instruction. llvm-svn: 214244	2014-07-29 23:16:31 +00:00
Joerg Sonnenberger	5dd0828a9c	Fix typo in alias: DSIR -> DSISR llvm-svn: 214238	2014-07-29 22:42:44 +00:00
Joerg Sonnenberger	d52d4c80b5	Support move to/from segment register. llvm-svn: 214234	2014-07-29 22:21:57 +00:00
Joerg Sonnenberger	f3709585f9	Add a number of aliases for SPR access. llvm-svn: 214196	2014-07-29 18:55:43 +00:00
Joerg Sonnenberger	c3bec0cdc8	Add rfi instruction. Based on feedback by Ulrich Weigand. llvm-svn: 214181	2014-07-29 15:49:09 +00:00
Matt Arsenault	55d94a2290	Fix typos / grammar. llvm-svn: 214147	2014-07-29 00:02:40 +00:00
Ulrich Weigand	37cf88e787	[PowerPC] Support ELFv1/ELFv2 ABI selection via features While LLVM now supports both ELFv1 and ELFv2 ABIs, their use is currently hard-coded via the target triple: powerpc64-linux is always ELFv1, while powerpc64le-linux is always ELFv2. These are of course the most common scenarios, but in principle it is possible to support the ELFv2 ABI on big-endian or the ELFv1 ABI on little-endian systems (and GCC does support that), and there are some special use cases for that (e.g. certain Linux kernel versions could only be built using ELFv1 on LE). This patch implements the LLVM side of supporting this. As precedent on other platforms suggests, ABI options are passed to the back-end as features. Thus, this patch implements two features "elfv1" and "elfv2" that select the desired ABI if present. (If not, the LLVM uses the same default rules as now.) llvm-svn: 214072	2014-07-28 13:09:28 +00:00
Matt Arsenault	76c7b7a591	Add alignment value to allowsUnalignedMemoryAccess Rename to allowsMisalignedMemoryAccess. On R600, 8 and 16 byte accesses are mostly OK with 4-byte alignment, and don't need to be split into multiple accesses. Vector loads with an alignment of the element type are not uncommon in OpenCL code. llvm-svn: 214055	2014-07-27 17:46:40 +00:00
Hal Finkel	ba36f6399d	[PowerPC] Support TLS on PPC32/ELF Patch by Justin Hibbits! llvm-svn: 213960	2014-07-25 17:47:22 +00:00
Bill Schmidt	1861624c2e	[PATCH][PPC64LE] Correct little-endian usage of vmrgh* and vmrgl. Because the PowerPC vmrgh and vmrgl* instructions have a built-in big-endian bias, it is necessary to swap their inputs in little-endian mode when using them to implement a vector shuffle. This was previously missed in the vector LE implementation. There was already logic to distinguish between unary and "normal" vmrg* vector shuffles, so this patch extends that logic to use a third option: "swapped" vmrg* vector shuffles that are used for little endian in place of the "normal" ones. I've updated the vec-shuffle-le.ll test to check for the expected register ordering on the generated instructions. This bug was discovered when testing the LE and ELFv2 patches for safety if they were backported to 3.4. A different vectorization decision was made in 3.4 than on mainline trunk, and that exposed the problem. I've verified this fix takes care of that issue. llvm-svn: 213915	2014-07-25 01:55:55 +00:00
Joerg Sonnenberger	97f682d3e2	Don't use 128bit functions on PPC32. llvm-svn: 213899	2014-07-24 22:20:10 +00:00
NAKAMURA Takumi	509350ce96	Prune dependency to MC from each target disassembler. llvm-svn: 213856	2014-07-24 11:45:11 +00:00
NAKAMURA Takumi	dbff653a5a	Update library dependencies. llvm-svn: 213832	2014-07-24 02:10:42 +00:00
Ulrich Weigand	eb914f2256	[PowerPC] ELFv2 aggregate passing support This patch adds infrastructure support for passing array types directly. These can be used by the front-end to pass aggregate types (coerced to an appropriate array type). The details of the array type being used inform the back-end about ABI-relevant properties. Specifically, the array element type encodes: - whether the parameter should be passed in FPRs, VRs, or just GPRs/stack slots (for float / vector / integer element types, respectively) - what the alignment requirements of the parameter are when passed in GPRs/stack slots (8 for float / 16 for vector / the element type size for integer element types) -- this corresponds to the "byval align" field Using the infrastructure provided by this patch, a companion patch to clang will enable two features: - In the ELFv2 ABI, pass (and return) "homogeneous" floating-point or vector aggregates in FPRs and VRs (this is similar to the ARM homogeneous aggregate ABI) - As an optimization for both ELFv1 and ELFv2 ABIs, pass aggregates that fit fully in registers without using the "byval" mechanism The patch uses the functionArgumentNeedsConsecutiveRegisters callback to encode that special treatment is required for all directly-passed array types. The isInConsecutiveRegs / isInConsecutiveRegsLast bits set as a results are then used to implement the required size and alignment rules in CalculateStackSlotSize / CalculateStackSlotAlignment etc. As a related change, the ABI routines have to be modified to support passing floating-point types in GPRs. This is necessary because with homogeneous aggregates of 4-byte float type we can now run out of FPRs before we run out of the 64-byte argument save area that is shadowed by GPRs. Any extra floating-point arguments that no longer fit in FPRs must now be passed in GPRs until we run out of those too. Note that there was already code to pass floating-point arguments in GPRs used with vararg parameters, which was done by writing the argument out to the argument save area first and then reloading into GPRs. The patch re-implements this, however, in favor of code packing float arguments directly via extension/truncation, BITCAST, and BUILD_PAIR operations. This is required to support the ELFv2 ABI, since we cannot unconditionally write to the argument save area (which the caller might not have allocated). The change does, however, affect ELFv1 varags routines too; but even here the overall effect should be advantageous: Instead of loading the argument into the FPR, then storing the argument to the stack slot, and finally reloading the argument from the stack slot into a GPR, the new code now just loads the argument into the FPR, and subsequently loads the argument into the GPR (via BITCAST). That BITCAST might imply a save/reload from a stack temporary (in which case we're no worse than before); but it might be implemented more efficiently in some cases. The final part of the patch enables up to 8 FPRs and VRs for argument return in PPCCallingConv.td; this is required to support returning ELFv2 homogeneous aggregates. (Note that this doesn't affect other ABIs since LLVM wil only look for which register to use if the parameter is marked as "direct" return anyway.) Reviewed by Hal Finkel. llvm-svn: 213493	2014-07-21 00:13:26 +00:00
Ulrich Weigand	9fcc5caf2d	[PowerPC] ELFv2 explicit CFI for CR fields This is a minor improvement in the ELFv2 ABI. In ELFv1, DWARF CFI would represent a saved CR word (holding CR fields CR2, CR3, and CR4) using just a single CFI record refering to CR2. In ELFv2 instead, each of the CR fields is represented by its own CFI record. The advantage is that the compiler can now chose to save just a single (or two) CR fields instead of all of them, if those are the only ones that actually need saving. That can lead to more efficient code using mf(o)crf instead of the (slow) mfcr instruction. Note that this patch does not (yet) implement this more efficient code generation, but it does implement the part that is required to be ABI compliant: creating multiple CFI records if multiple CR fields are saved. Reviewed by Hal Finkel. llvm-svn: 213492	2014-07-21 00:03:18 +00:00
Ulrich Weigand	fb90fdfb31	[PowerPC] ELFv2 stack space reduction The ELFv2 ABI reduces the amount of stack required to implement an ABI-compliant function call in two ways: * the "linkage area" is reduced from 48 bytes to 32 bytes by eliminating two unused doublewords * the 64-byte "parameter save area" is now optional and need not be present in certain cases (it remains mandatory in functions with variable arguments, and functions that have any parameter that is passed on the stack) The following patch implements this required changes: - reducing the linkage area, and associated relocation of the TOC save slot, in getLinkageSize / getTOCSaveOffset (this requires updating all callers of these routines to pass in the isELFv2ABI flag). - (partially) handling the case where the parameter save are is optional This latter part requires some extra explanation: Currently, we still always allocate the parameter save area when calling a function. That is certainly always compliant with the ABI, but may cause code to allocate stack unnecessarily. This can be addressed by a follow-on optimization patch. On the callee side, in LowerFormalArguments, we must track correctly whether the ABI guarantees that the caller has allocated the parameter save area for our use, and the patch does so. However, there is one complication: the code that handles incoming "byval" arguments will currently always write to the parameter save area, because it has to force incoming register arguments to the stack since it must return an address to implement the byval semantics. To fix this, the patch changes the LowerFormalArguments code to write arguments to a freshly allocated stack slot on the function's own stack frame instead of the argument save area in those cases where that area is not present. Reviewed by Hal Finkel. llvm-svn: 213490	2014-07-20 23:43:15 +00:00
Ulrich Weigand	41e116ee77	[PowerPC] ELFv2 function call changes This patch builds upon the two preceding MC changes to implement the basic ELFv2 function call convention. In the ELFv1 ABI, a "function descriptor" was associated with every function, pointing to both the entry address and the related TOC base (and a static chain pointer for nested functions). Function pointers would actually refer to that descriptor, and the indirect call sequence needed to load up both entry address and TOC base. In the ELFv2 ABI, there are no more function descriptors, and function pointers simply refer to the (global) entry point of the function code. Indirect function calls simply branch to that address, after loading it up into r12 (as required by the ABI rules for a global entry point). Direct function calls continue to just do a "bl" to the target symbol; this will be resolved by the linker to the local entry point of the target function if it is local, and to a PLT stub if it is global. That PLT stub would then load the (global) entry point address of the final target into r12 and branch to it. Note that when performing a local function call, r2 must be set up to point to the current TOC base: if the target ends up local, the ABI requires that its local entry point is called with r2 set up; if the target ends up global, the PLT stub requires that r2 is set up. This patch implements all LLVM changes to implement that scheme: - No longer create a function descriptor when emitting a function definition (in EmitFunctionEntryLabel) - Emit two entry points if the function needs the TOC base (r2) anywhere (this is done EmitFunctionBodyStart; note that this cannot be done in EmitFunctionBodyStart because the global entry point prologue code must be part of the function as covered by debug info). - In order to make use tracking of r2 (as needed above) work correctly, mark direct function calls as implicitly using r2. - Implement the ELFv2 indirect function call sequence (no function descriptors; load target address into r12). - When creating an ELFv2 object file, emit the .abiversion 2 directive to tell the linker to create the appropriate version of PLT stubs. Reviewed by Hal Finkel. llvm-svn: 213489	2014-07-20 23:31:44 +00:00
Ulrich Weigand	28c7be6585	[MC] Pass MCSymbolData to needsRelocateWithSymbol As discussed in a previous checking to support the .localentry directive on PowerPC, we need to inspect the actual target symbol in needsRelocateWithSymbol to make the appropriate decision based on that symbol's st_other bits. Currently, needsRelocateWithSymbol does not get the target symbol. However, it is directly available to its sole caller. This patch therefore simply extends the needsRelocateWithSymbol by a new parameter "const MCSymbolData &SD", passes in the target symbol, and updates all derived implementations. In particular, in the PowerPC implementation, this patch removes the FIXME added by the previous checkin. llvm-svn: 213487	2014-07-20 23:15:06 +00:00
Ulrich Weigand	1376d19372	[PowerPC] ELFv2 MC support for .localentry directive A second binutils feature needed to support ELFv2 is the .localentry directive. In the ELFv2 ABI, functions may have two entry points: one for calling the routine locally via "bl", and one for calling the function via function pointer (either at the source level, or implicitly via a PLT stub for global calls). The two entry points share a single ELF symbol, where the ELF symbol address identifies the global entry point address, while the local entry point is found by adding a delta offset to the symbol address. That offset is encoded into three platform-specific bits of the ELF symbol st_other field. The .localentry directive instructs the assembler to set those fields to encode a particular offset. This is typically used by a function prologue sequence like this: func: addis r2, r12, (.TOC.-func)@ha addi r2, r2, (.TOC.-func)@l .localentry func, .-func Note that according to the ABI, when calling the global entry point, r12 must be set to point the global entry point address itself; while when calling the local entry point, r2 must be set to point to the TOC base. The two instructions between the global and local entry point in the above example translate the first requirement into the second. This patch implements support in the PowerPC MC streamers to emit the .localentry directive (both into assembler and ELF object output), as well as support in the assembler parser to parse that directive. In addition, there is another change required in MC fixup/relocation handling to properly deal with relocations targeting function symbols with two entry points: When the target function is known local, the MC layer would immediately handle the fixup by inserting the target address -- this is wrong, since the call may need to go to the local entry point instead. The GNU assembler handles this case by not directly resolving fixups targeting functions with two entry points, but always emits the relocation and relies on the linker to handle this case correctly. This patch changes LLVM MC to do the same (this is done via the processFixupValue routine). Similarly, there are cases where the assembler would normally emit a relocation, but "simplify" it to a relocation targeting a section instead of the actual symbol. For the same reason as above, this may be wrong when the target symbol has two entry points. The GNU assembler again handles this case by not performing this simplification in that case, but leaving the relocation targeting the full symbol, which is then resolved by the linker. This patch changes LLVM MC to do the same (via the needsRelocateWithSymbol routine). NOTE: The method used in this patch is overly pessimistic, since the needsRelocateWithSymbol routine currently does not have access to the actual target symbol, and thus must always assume that it might have two entry points. This will be improved upon by a follow-on patch that modifies common code to pass the target symbol when calling needsRelocateWithSymbol. Reviewed by Hal Finkel. llvm-svn: 213485	2014-07-20 23:06:03 +00:00
Ulrich Weigand	22549677c8	[PowerPC] ELFv2 MC support for .abiversion directive ELFv2 binaries are marked by a bit in the ELF header e_flags field. A new assembler directive .abiversion can be used to set that flag. This patch implements support in the PowerPC MC streamers to emit the .abiversion directive (both into assembler and ELF binary output), as well as support in the assembler parser to parse the .abiversion directive. Reviewed by Hal Finkel. llvm-svn: 213484	2014-07-20 22:56:57 +00:00
Ulrich Weigand	d1393de80e	[PowerPC] Refactor byval handling in LowerFormalArguments_64SVR4 When handling an incoming byval argument, we need to possibly write incoming registers to the stack in order to create an on-stack image of the parameter, so we can return its address to common code. This currently uses CreateFixedObject to access the parts of the parameter save area where the argument is (or needs to be) stored. However, sometimes we need to access multiple parts of that area, e.g. to write multiple registers. The code currently uses a new CreateFixedObject call for each of these accesses, resulting in a patchwork of overlapping (fixed) stack objects. This doesn't really matter in the case of fixed objects, since any access to those turns into a fixed stackpointer + offset address anyway. However, with the upcoming ELFv2 patches, we may actually need to place an incoming argument into our own stack frame instead of the caller's. This means we need to use CreateStackObject instead, and we cannot have multiple overlapping instances of those. To make the rest of the argument handling code work equally in both situations, this patch refactors it to always use just a single call to CreateFixedObject, and access parts of that object as required using address arithmetic. This way, we can in a future patch substitute CreateStackObject without further changes. No change to generated code intended. llvm-svn: 213483	2014-07-20 22:36:52 +00:00
Ulrich Weigand	c274d8aae7	[PowerPC] Fix FrameIndex handling in SelectAddressRegImm The PPCTargetLowering::SelectAddressRegImm routine needs to handle FrameIndex nodes in a special manner, by tranlating them into a TargetFrameIndex node. This was done in most cases, but seems to have been neglected in one path: when the input tree has an OR of the FrameIndex with an immediate. This can happen if the FrameIndex can be proven to be sufficiently aligned that an OR of that immediate is equivalent to an ADD. The missing handling of FrameIndex in that case caused the SelectionDAG instruction selection to miss opportunities to merge the OR back into the FrameIndex node, leading to superfluous addi/ori instructions in the final assembler output. llvm-svn: 213482	2014-07-20 22:26:40 +00:00
Hal Finkel	006e1d44a6	[PowerPC] 32-bit ELF PIC support This adds initial support for PPC32 ELF PIC (Position Independent Code; the -fPIC variety), thus rectifying a long-standing deficiency in the PowerPC backend. Patch by Justin Hibbits! llvm-svn: 213427	2014-07-18 23:29:49 +00:00
Sanjay Patel	2f0f025b2b	Move Post RA Scheduling flag bit into SchedMachineModel Refactoring; no functional changes intended Removed PostRAScheduler bits from subtargets (X86, ARM). Added PostRAScheduler bit to MCSchedModel class. This bit is set by a CPU's scheduling model (if it exists). Removed enablePostRAScheduler() function from TargetSubtargetInfo and subclasses. Fixed the existing enablePostMachineScheduler() method to use the MCSchedModel (was just returning false!). Added methods to TargetSubtargetInfo to allow overrides for AntiDepBreakMode, CriticalPathRCs, and OptLevel for PostRAScheduling. Added enablePostRAScheduler() function to PostRAScheduler class which queries the subtarget for the above values. Preserved existing scheduler behavior for ARM, MIPS, PPC, and X86: a. ARM overrides the CPU's postRA settings by enabling postRA for any non-Thumb or Thumb2 subtarget. b. MIPS overrides the CPU's postRA settings by enabling postRA for everything. c. PPC overrides the CPU's postRA settings by enabling postRA for everything. d. X86 is the only target that actually has postRA specified via sched model info. Differential Revision: http://reviews.llvm.org/D4217 llvm-svn: 213101	2014-07-15 22:39:58 +00:00
NAKAMURA Takumi	365309bb8b	Prune Redundant libdeps in CMake's target_link_libraries and LLVMBuild.txt. I checked this with Release+Asserts on x86_64-mingw32. Please restore partially if this were overkill. llvm-svn: 213064	2014-07-15 11:37:03 +00:00
Ulrich Weigand	1aae06e395	[PowerPC] Fix invalid displacement created by LocalStackAlloc This commit fixes a bug in PPCRegisterInfo::isFrameOffsetLegal that could result in the LocalStackAlloc pass creating an MI instruction out-of-range displacement: %vreg17<def> = LD 33184, %vreg31; mem:LD8[%g](align=32) %G8RC:%vreg17 G8RC_and_G8RC_NOX0:%vreg31 (In final assembler output the top bits are stripped off, resulting in a negative offset loading from below the stack pointer.) Common code expects the isFrameOffsetLegal routine to verify whether adding a given offset to the offset already present in the instruction results in a valid displacement. However, on PowerPC the routine did not take the already present instruction offset into account. This commit fixes isFrameOffsetLegal to add the instruction offset, and updates a local caller (needsFrameBaseReg) to no longer add the instruction offset itself before calling isFrameOffsetLegal. Reviewed by Hal Finkel. llvm-svn: 212832	2014-07-11 17:19:31 +00:00
Ulrich Weigand	1078e34b76	[PowerPC] Implement atomic NAND operations as actual NAND This changes the implementation of atomic NAND operations from "a & ~b" (compatible with GCC < 4.4) to actual "~(a & b)" (compatible with GCC >= 4.4). This is in line with the common-code and ARM back-end change implemented in r212433. llvm-svn: 212547	2014-07-08 16:16:02 +00:00
Ulrich Weigand	5e8a0b585b	[PowerPC] Fix no-assert build r212476 caused a compile failure (unused variable) in a non-assertion build ... llvm-svn: 212477	2014-07-07 19:39:44 +00:00
Ulrich Weigand	8f2aff0b0c	[PowerPC] Fix "byval align" arguments Arguments passed as "byval align" should get the specified alignment in the parameter save area. There was some code in PPCISelLowering.cpp that attempted to implement this, but this didn't work correctly: while code did update the ArgOffset value, it neglected to update the PtrOff value (which was already computed from the old ArgOffset), and it also neglected to update GPR_idx -- fields skipped due to alignment in the save area must likewise be skipped in GPRs. This patch fixes and simplifies this logic by: - handling argument offset alignment right at the beginning of argument processing, using a new helper routine CalculateStackSlotAlignment (this avoids having to update PtrOff and other derived values later on) - not tracking GPR_idx separately, but always computing the correct GPR_idx for each argument from its ArgOffset - removing some redundant computation in LowerFormalArguments: MinReservedArea must equal ArgOffset after argument processing, so there's no use in computing it twice. [This doesn't change the behavior of the current clang front-end, since that never creates "byval align" arguments at the moment. This will change with a follow-on patch, however.] llvm-svn: 212476	2014-07-07 19:26:41 +00:00
Juergen Ributzka	241f08ad4e	[DAG] Pass the argument list to the CallLoweringInfo via move semantics. NFCI. The argument list vector is never used after it has been passed to the CallLoweringInfo and moving it to the CallLoweringInfo is cleaner and pretty much as cheap as keeping a pointer to it. llvm-svn: 212135	2014-07-01 22:01:54 +00:00
Craig Topper	4c15d35f50	Add ops() method to SDNode that returns an ArrayRef<SDUse>. Use it to simplify some code. llvm-svn: 211993	2014-06-29 00:40:57 +00:00
Ulrich Weigand	85c764c15a	[PowerPC] Constrain base register in PPCRegisterInfo::resolveFrameIndex I've run into a bug where current LLVM at -O0 (with fast-isel) generated invalid code like: ld 0, 20936(1) # 8-byte Folded Reload stw 12, 10348(0) stw 12, 10344(0) The underlying vreg had been introduced as base register by the Local Stack Slot Allocation pass. That register was constrained to G8RC by PPCRegisterInfo::materializeFrameBaseRegister to match the ADDI instruction used to set it, but it was not constrained to G8RC_NOX0 to fit the use of the register in an address. That should have happened in PPCRegisterInfo::resolveFrameIndex. This patch adds an appropriate constrainRegClass call. Reviewed by Hal Finkel. llvm-svn: 211897	2014-06-27 13:04:12 +00:00
Eric Christopher	04713c650c	Remove extraneous includes from the target machines. llvm-svn: 211800	2014-06-26 19:30:05 +00:00
Will Schmidt	fbbc998175	add ppc64/pwr8 as target includes handling DIR_PWR8 where appropriate The P7Model Itinerary is currently tied in for use under the P8Model, and will be updated later. llvm-svn: 211779	2014-06-26 13:36:19 +00:00
Rafael Espindola	196881cf29	Move expression visitation logic up to MCStreamer. Remove the duplicate from MCRecordStreamer. No functionality change. llvm-svn: 211714	2014-06-25 15:45:33 +00:00
Rafael Espindola	0e71694bfa	Simplify the visitation of target expressions. No functionality change. llvm-svn: 211707	2014-06-25 15:29:54 +00:00
Bill Schmidt	bfe90f8c83	[PPC64] Fix PR20071 (fctiduz generated for targets lacking that instruction) PR20071 identifies a problem in PowerPC's fast-isel implementation for floating-point conversion to integer. The fctiduz instruction was added in Power ISA 2.06 (i.e., Power7 and later). However, this instruction is being generated regardless of which 64-bit PowerPC target is selected. The intent is for fast-isel to punt to DAG selection when this instruction is not available. This patch implements that change. For testing purposes, the existing fast-isel-conversion.ll test adds a RUN line for -mcpu=970 and tests for the expected code generation. Additionally, the existing test fast-isel-conversion-p5.ll was found to be incorrectly expecting the unavailable instruction to be generated. I've removed these test variants since we have adequate coverage in fast-isel-conversion.ll. llvm-svn: 211627	2014-06-24 20:05:18 +00:00
Ulrich Weigand	b36d130d19	[PowerPC] Refactor getMinCallFrameSize / getMinCallArgumentsSize As of r211495, the only remaining users of getMinCallFrameSize are in core ABI code (LowerFormalParameter / LowerCall). This is actually a good thing, since the details of the parameter save area are ABI specific. With the new ELFv2 ABI in particular, the rules defining the size of the save area will become significantly more complex, so it wouldn't make sense to implement those outside ABI code that has all required information. In preparation, this patch eliminates the getMinCallFrameSize (and associated getMinCallArgumentsSize) routines, and inlines them into all callers. Note that since nearly all call arguments are constant, this allows simplifying the inlined copies to a single line everywhere. No change in generate code expected. llvm-svn: 211497	2014-06-23 14:15:53 +00:00
Ulrich Weigand	ac0412b387	[PowerPC] Allow stack frames without parameter save area The PPCFrameLowering::determineFrameLayout routine currently ensures that every function that allocates a stack frame provides space for the parameter save area (via PPCFrameLowering::getMinCallFrameSize). This is actually not necessary. There may be functions that never call another routine but still allocate a frame; those do not require the parameter save area. In the future, with the ELFv2 ABI, even some routines that do call other functions do not need to allocate the parameter save area. While it is not a bug to allocate the parameter area when it is not needed, it is better to avoid it to save stack space. Note that when any particular function call requires the parameter save area, this space will already have been included by ABI code in the size the CALLSEQ_START insn is annotated with, and therefore included in the size returned by MFI->getMaxCallFrameSize(). This means that determineFrameLayout simply does not need to care about the parameter save area. (It still needs to ensure that every frame provides the linkage area.) This is implemented by this patch. Note that this exposed a bug in the new fast-isel code where the parameter area was not included in the CALLSEQ_START size; this is also fixed. A couple of test cases needed to be adapted for the new (smaller) stack frame size those tests now see. llvm-svn: 211495	2014-06-23 13:47:52 +00:00
Ulrich Weigand	cd08720d8e	[PowerPC] Fix IsDarwin arg in PPCFrameLowering:: calls As remarked in the commit message to r211493, in several places throughout the 64-bit SVR4 ABI code there are calls to PPCFrameLowering::getLinkageSize and getMinCallFrameSize using an incorrect IsDarwin argument of "true". (Some of those were made explicit by the above refactoring patch, others have been there all along.) This patch fixes those places to pass "false" for IsDarwin. No change in generated code expected. llvm-svn: 211494	2014-06-23 13:21:43 +00:00
Ulrich Weigand	186fcfa6d3	[PowerPC] Refactor setMinReservedArea and CalculateParameterAndLinkageAreaSize The PPCISelLowering.cpp routines PPCTargetLowering::setMinReservedArea and CalculateParameterAndLinkageAreaSize are currently used as subroutines from both 64-bit SVR4 and Darwin ABI code. However, the two ABIs are already quite different w.r.t. AltiVec conventions, and they will become more different when the ELFv2 ABI is supported. Also, in general it seems better to disentangle ABI support routines for different ABIs to avoid accidentally affecting one ABI when intending to change only the other. (Actually, the current code strictly speaking already contains a bug: these routines call PPCFrameLowering::getMinCallFrameSize and PPCFrameLowering::getLinkageSize with the IsDarwin parameter set to "true" even on 64-bit SVR4. This bug currently has no adverse effect since those routines always return the same for 64-bit SVR4 and 64-bit Darwin, but it still seems wrong ... I'll fix this in a follow-up commit shortly.) To remove this code sharing, I'm simply inlining both routines into all call sites (there are just two each, one for 64-bit SVR4 and one for Darwin), and simplifying due to constant parameters where possible. A small piece of code that does make sense to share is refactored into the new routine EnsureStackAlignment, now also called from 32-bit SVR4 ABI code. No change in generated code is expected. llvm-svn: 211493	2014-06-23 13:08:27 +00:00
Ulrich Weigand	e1fa36144d	[PowerPC] Fix on-stack AltiVec arguments with 64-bit SVR4 Current 64-bit SVR4 code seems to have some remnants of Darwin code in AltiVec argument handing. This had the effect that AltiVec arguments (or subsequent arguments) were not correctly placed in the parameter area in some cases. The correct behaviour with the 64-bit SVR4 ABI is: - All AltiVec arguments take up space in the parameter area, just like any other arguments, whether vararg or not. - They are always 16-byte aligned, skipping a parameter area doubleword (and the associated GPR, if any), if necessary. This patch implements the correct behaviour and adds a test case. (Verified against GCC behaviour via the ABI compat test suite.) llvm-svn: 211492	2014-06-23 12:36:34 +00:00
Ulrich Weigand	3f880bd06e	[PowerPC] Fix small argument stack slot offset for LE When small arguments (structures < 8 bytes or "float") are passed in a stack slot in the ppc64 SVR4 ABI, they must reside in the least significant part of that slot. On BE, this means that an offset needs to be added to the stack address of the parameter, but on LE, the least significant part of the slot has the same address as the slot itself. This changes the PowerPC back-end ABI code to only add the small argument stack slot offset for BE. It also adds test cases to verify the correct behavior on both BE and LE. llvm-svn: 211368	2014-06-20 16:34:05 +00:00
Ulrich Weigand	c393e04673	[PowerPC] Remove unnecessary load of r12 in indirect call When looking at the 64-bit SVR4 indirect call sequence, I noticed an unnecessary load of r12. And indeed the code says: // R12 must contain the address of an indirect callee. But this is not correct; in the 64-bit SVR4 (ELFv1) ABI, there is no need to load r12 at this point. It seems this code and comment is a remnant of code originally shared with the Darwin ABI ... This patch simply removes the unnecessary load. llvm-svn: 211203	2014-06-18 18:33:36 +00:00
Ulrich Weigand	e256aa48b9	[PowerPC] Simplify and improve loading into TOC register During an indirect function call sequence on the 64-bit SVR4 ABI, generate code must load and then restore the TOC register. This does not use a regular LOAD instruction since the TOC register r2 is marked as reserved. Instead, the are two special instruction patterns: let RST = 2, DS = 2 in def LDinto_toc: DSForm_1a<58, 0, (outs), (ins g8rc:$reg), "ld 2, 8($reg)", IIC_LdStLD, [(PPCload_toc i64:$reg)]>, isPPC64; let RST = 2, DS = 10, RA = 1 in def LDtoc_restore : DSForm_1a<58, 0, (outs), (ins), "ld 2, 40(1)", IIC_LdStLD, [(PPCtoc_restore)]>, isPPC64; Note that these not only restrict the destination of the load to r2, but they also restrict the source of the load to particular address combinations. The latter is a problem when we want to support the ELFv2 ABI, since there the TOC save slot is no longer at 40(1). This patch replaces those two instructions with a single instruction pattern that only hard-codes r2 as destination, but supports generic addresses as source. This will allow supporting the ELFv2 ABI, and also helps generate more efficient code for calls to absolute addresses (allowing simplification of the ppc64-calls.ll test case). llvm-svn: 211193	2014-06-18 17:52:49 +00:00
Ulrich Weigand	1458f67d3e	[PowerPC] Do not use BLA with the 64-bit SVR4 ABI The PowerPC back-end uses BLA to implement calls to functions at known-constant addresses, which is apparently used for certain system routines on Darwin. However, with the 64-bit SVR4 ABI, this is actually incorrect. An immediate function pointer value on this platform is not directly usable as a target address for BLA: - in the ELFv1 ABI, the function pointer value refers to the function descriptor, not the code address - in the ELFv2 ABI, the function pointer value refers to the global entry point, but BL(A) would only be correct when calling the local entry point This bug didn't show up since using immediate function pointer values is not usually done in the 64-bit SVR4 ABI in the first place. However, I ran into this issue with a certain use case of LLVM as JIT, where immediate function pointer values were uses to implement callbacks from JITted code to helpers in statically compiled code. Fixed by simply not using BLA with the 64-bit SVR4 ABI. llvm-svn: 211174	2014-06-18 16:14:04 +00:00
Ulrich Weigand	cfcf14bd4c	[PowerPC] Fix emitting instruction pairs on LE My patch r204634 to emit instructions in little-endian format failed to handle those special cases where we emit a pair of instructions from a single LLVM MC instructions (like the bl; nop pairs used to implement the call sequence). In those cases, we still need to emit the "first" instruction (the one in the more significant word) first, on both big and little endian, and not swap them. llvm-svn: 211171	2014-06-18 15:37:07 +00:00
Bill Schmidt	b0bab996e0	[PPC64] Fix PR19893 - improve code generation for local function addresses Rafael opened http://llvm.org/bugs/show_bug.cgi?id=19893 to track non-optimal code generation for forming a function address that is local to the compile unit. The existing code was treating both local and non-local functions identically. This patch fixes the problem by properly identifying local functions and generating the proper addis/addi code. I also noticed that Rafael's earlier changes to correct the surrounding code in PPCISelLowering.cpp were also needed for fast instruction selection in PPCFastISel.cpp, so this patch fixes that code as well. The existing test/CodeGen/PowerPC/func-addr.ll is modified to test the new code generation. I've added a -O0 run line to test the fast-isel code as well. Tested on powerpc64[le]-unknown-linux-gnu with no regressions. llvm-svn: 211056	2014-06-16 21:36:02 +00:00
Eric Christopher	2ea2870f36	The hazard recognizer only needs a subtarget, not a target machine so make it take one. Fix up all users accordingly. llvm-svn: 210948	2014-06-13 22:38:52 +00:00
Eric Christopher	72e46a5bfe	Fix typo. llvm-svn: 210947	2014-06-13 22:38:48 +00:00
Eric Christopher	286dd39af0	Move the PPCSelectionDAGInfo off the TargetMachine and onto the subtarget. llvm-svn: 210854	2014-06-12 23:02:32 +00:00
Eric Christopher	0382d62f87	Make PPCSelectionDAGInfo take a DataLayout instead of a TargetMachine since that's all it needs. llvm-svn: 210853	2014-06-12 22:56:48 +00:00
Eric Christopher	42e57db35c	Move PPCTargetLowering off of the TargetMachine and onto the subtarget. llvm-svn: 210852	2014-06-12 22:50:10 +00:00
Eric Christopher	576f8a1a6b	Remove an extraneous this-> to access the subtarget. llvm-svn: 210849	2014-06-12 22:38:20 +00:00
Eric Christopher	d1b1190bd1	Rename PPCSubTarget to Subtarget in PPCTargetLowering for consistency. Also remove an extra local subtarget in the initialization functions. llvm-svn: 210848	2014-06-12 22:38:18 +00:00
Eric Christopher	429be5d609	Move PPCJITInfo off of the TargetMachine and onto the subtarget. Needed to migrate a few functions around to avoid circular header dependencies. llvm-svn: 210845	2014-06-12 22:28:06 +00:00
Eric Christopher	40ba57e7f5	Remove the use of TargetMachine from PPCJITInfo and replace with the subtarget. Also remove unnecessary argument to the constructor at the same time, we already have access via the subtarget. llvm-svn: 210844	2014-06-12 22:19:51 +00:00
Eric Christopher	95b7901fd6	Move PPCInstrInfo off of the target machine and onto the subtarget. llvm-svn: 210839	2014-06-12 22:05:46 +00:00
Eric Christopher	d4532ed073	Remove TargetMachine from PPCInstrInfo and all dependencies and replace with the current subtarget. llvm-svn: 210836	2014-06-12 21:48:52 +00:00
Eric Christopher	fe75c4c997	Move DataLayout from the PPCTargetMachine to the subtarget. llvm-svn: 210824	2014-06-12 21:08:06 +00:00
Eric Christopher	0c6467adf3	Move PPCFrameLowering into PPCSubtarget from PPCTargetMachine. Use the initializeSubtargetDependencies code to obtain an initialized subtarget and migrate a couple of subtarget using functions to the .cpp file to avoid circular includes. llvm-svn: 210822	2014-06-12 20:54:11 +00:00
Eric Christopher	a30bc53a6b	Remove duplicate copy of InstrItineraryData from the TargetMachine, it's already on the subtarget. llvm-svn: 210619	2014-06-11 00:53:17 +00:00
Bill Schmidt	6e11183ad7	[PPC64LE] Recognize shufflevector patterns for little endian Various masks on shufflevector instructions are recognizable as specific PowerPC instructions (vector pack, vector merge, etc.). There is existing code in PPCISelLowering.cpp to recognize the correct patterns for big endian code. The masks for these instructions are different for little endian code due to the big-endian numbering employed by these instructions. This patch adds the recognition code for little endian. I've added a new test case test/CodeGen/PowerPC/vec_shuffle_le.ll for this. The existing recognizer test (vec_shuffle.ll) is unnecessarily verbose and difficult to read, so I felt it was better to add a new test rather than modify the old one. llvm-svn: 210536	2014-06-10 14:35:01 +00:00
Bill Schmidt	41cd7375c8	[PPC64LE] Generate correct code for unaligned little-endian vector loads The code in PPCTargetLowering::PerformDAGCombine() that handles unaligned Altivec vector loads generates a lvsl followed by a vperm. As we've seen in numerous other places, the vperm instruction has a big-endian bias, and this is fixed for little endian by complementing the permute control vector and swapping the input operands. In this case the lvsl is providing the permute control vector. Rather than generating an lvsl and a complement operation, it is sufficient to generate an lvsr instruction instead. Thus for LE code generation we will generate an lvsr rather than an lvsl, and swap the other input arguments on the vperm. The existing test/CodeGen/PowerPC/vec_misalign.ll is updated to test the code generation for PPC64 and PPC64LE, in addition to the existing PPC32/G5 testing. llvm-svn: 210493	2014-06-09 22:00:52 +00:00
Bill Schmidt	3ff0a8eb8b	[PPC64LE] Generate correct little-endian code for v16i8 multiply The existing code in PPCTargetLowering::LowerMUL() for multiplying two v16i8 values assumes that vector elements are numbered in big-endian order. For little-endian targets, the vector element numbering is reversed, but the vmuleub, vmuloub, and vperm instructions still assume big-endian numbering. To account for this, we must adjust the permute control vector and reverse the order of the input registers on the vperm instruction. The existing test/CodeGen/PowerPC/vec_mul.ll is updated to be executed on powerpc64 and powerpc64le targets as well as the original powerpc (32-bit) target. llvm-svn: 210474	2014-06-09 16:06:29 +00:00
David Blaikie	f670b953e7	AsmMatchers: Use unique_ptr to manage ownership of MCParsedAsmOperand I saw at least a memory leak or two from inspection (on probably untested error paths) and r206991, which was the original inspiration for this change. I ran this idea by Jim Grosbach a few weeks ago & he was OK with it. Since it's a basically mechanical patch that seemed sufficient - usual post-commit review, revert, etc, as needed. llvm-svn: 210427	2014-06-08 16:18:35 +00:00
Eric Christopher	db8e2ecde5	Have TargetSelectionDAGInfo take a DataLayout initializer rather than a TargetMachine since the only thing it wants is DataLayout. llvm-svn: 210366	2014-06-06 19:04:48 +00:00
Bill Schmidt	647be1ef2c	[PPC64LE] Fix lowering of BUILD_VECTOR and SHUFFLE_VECTOR for little endian This patch fixes a couple of lowering issues for little endian PowerPC. The code for lowering BUILD_VECTOR contains a number of optimizations that are only valid for big endian. For now, we disable those optimizations for correctness. In the future, we will add analogous optimizations that are correct for little endian. When lowering a SHUFFLE_VECTOR to a VPERM operation, we again need to make the now-familiar transformation of swapping the input operands and complementing the permute control vector. Correctness of this transformation is tested by the accompanying test case. llvm-svn: 210336	2014-06-06 14:06:26 +00:00
Bill Schmidt	9380b3b7bf	[PPC64LE] Temporarily disable VSX support in little-endian mode This is a preliminary patch for the PowerPC64LE support. In stage 1 of the vector support, we will support the VMX (Altivec) instruction set, but will not yet support the VSX instructions. This is merely a staging issue to provide functional vector support as soon as possible. llvm-svn: 210271	2014-06-05 16:21:13 +00:00
Eric Christopher	44a7e98ce5	Omit else branch after return. llvm-svn: 210034	2014-06-02 17:29:07 +00:00
Eric Christopher	1aad72164e	Have the TLOF creation take a Triple rather than needing a subtarget. llvm-svn: 209937	2014-05-31 00:07:32 +00:00
Eric Christopher	0cc6977494	isSVR4ABI() returned !isDarwin() so just move that to the else block and remove the unreachable code. llvm-svn: 209927	2014-05-30 22:47:53 +00:00
Eric Christopher	f0478ea2df	Rename CreateTLOF->createTLOF to match the rest of the file and the rest of the targets with a similar function name. llvm-svn: 209926	2014-05-30 22:47:48 +00:00
Rafael Espindola	34f4870951	[PPC] Use alias symbols in address computation. This seems to match what gcc does for ppc and what every other llvm backend does. This is a fixed version of r209638. The difference is to avoid any change in behavior for functions. The logic for using constant pools for function addresseses is spread over a few places and we have to keep them in sync. llvm-svn: 209821	2014-05-29 15:41:38 +00:00
Rafael Espindola	acdb307db3	[pr19844] Add thread local mode to aliases. This matches gcc's behavior. It also seems natural given that aliases contain other properties that govern how it is accessed (linkage, visibility, dll storage). Clang still has to be updated to expose this feature to C. llvm-svn: 209759	2014-05-28 18:15:43 +00:00
Hal Finkel	47a225fb6c	Revert "[PPC] Use alias symbols in address computation." This reverts commit r209638 because it broke self-hosting on ppc64/Linux. (the Clang-compiled TableGen would segfault because it jumped to an invalid address from within _ZNK4llvm17ManagedStaticBase21RegisterManagedStaticEPFPvvEPFvS1_E (which is within the command-line parameter registration process)). llvm-svn: 209745	2014-05-28 15:25:06 +00:00
Bill Schmidt	b806d02b5b	[PATCH] Correct type used for VADD_SPLAT optimization on PowerPC In PPCISelLowering.cpp: PPCTargetLowering::LowerBUILD_VECTOR(), there is an optimization for certain patterns to generate one or two vector splats followed by a vector add or subtract. This operation is represented by a VADD_SPLAT in the selection DAG. Prior to this patch, it was possible for the VADD_SPLAT to be assigned the wrong data type, causing incorrect code generation. This patch corrects the problem. Specifically, the code previously assigned the value type of the BUILD_VECTOR node to the newly generated VADD_SPLAT node. This is correct much of the time, but not always. The problem is that the call to isConstantSplat() may return a SplatBitSize that is not the same as the number of bits in the original element vector type. The correct type to assign is a vector type with the same element bit size as SplatBitSize. The included test case shows an example of this, where the BUILD_VECTOR node has a type of v16i8. The vector to be built is {0, 16, 0, 16, 0, 16, 0, 16, 0, 16, 0, 16, 0, 16, 0, 16}. isConstantSplat detects that we can generate a splat of 16 for type v8i16, which is the type we must assign to the VADD_SPLAT node. If we do not, we generate a vspltisb of 8 and a vaddubm, which generates the incorrect result {16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16}. The correct code generation is a vspltish of 8 and a vadduhm. This patch also corrected code generation for CodeGen/PowerPC/2008-07-10-SplatMiscompile.ll, which had been marked as an XFAIL, so we can remove the XFAIL from the test case. llvm-svn: 209662	2014-05-27 15:57:51 +00:00
Rafael Espindola	94cd9a1ed6	[PPC] Use alias symbols in address computation. This seems to match what gcc does for ppc and what every other llvm backend does. llvm-svn: 209638	2014-05-26 19:08:19 +00:00
Eric Christopher	1b3b092405	Fix typo. llvm-svn: 209377	2014-05-22 01:21:44 +00:00
Eric Christopher	7898bbfc19	Avoid using subtarget features when initializing the pass pipeline on PPC. llvm-svn: 209376	2014-05-22 01:21:35 +00:00
Eric Christopher	4c56d10ea0	Reset the subtarget for DAGToDAG on every iteration of runOnMachineFunction. This required updating the generated functions and TD file accordingly to be pointers rather than const references. llvm-svn: 209375	2014-05-22 01:07:24 +00:00
Eric Christopher	7880d61aac	Make early if conversion dependent upon the subtarget and add a subtarget hook to enable. Unconditionally add to the pass pipeline for targets that might want to use it. No functional change. llvm-svn: 209340	2014-05-21 23:40:26 +00:00
Adam Nemet	37337f0359	[PowerPC] PR19796: Also match ISD::TargetConstant in isIntS16Immediate The SplitIndexingFromLoad changes exposed a latent isel bug in the PowerPC64 backend. We matched an immediate offset with STWX8 even though it only supports register offset. The culprit is the complex-pattern predicate, SelectAddrIdx, which decides that if the offset is not ISD::Constant it must be a register. Many thanks to Bill Schmidt for testing this. llvm-svn: 209219	2014-05-20 17:20:34 +00:00
Benjamin Kramer	600e24a1cb	SDAG: Legalize vector BSWAP into a shuffle if the shuffle is legal but the bswap not. - On ARM/ARM64 we get a vrev because the shuffle matching code is really smart. We still unroll anything that's not v4i32 though. - On X86 we get a pshufb with SSSE3. Required more cleverness in isShuffleMaskLegal. - On PPC we get a vperm for v8i16 and v4i32. v2i64 is unrolled. llvm-svn: 209123	2014-05-19 13:12:38 +00:00
Saleem Abdulrasool	501d3b6235	Target: remove old constructors for CallLoweringInfo This is mostly a mechanical change changing all the call sites to the newer chained-function construction pattern. This removes the horrible 15-parameter constructor for the CallLoweringInfo in favour of setting properties of the call via chained functions. No functional change beyond the removal of the old constructors are intended. llvm-svn: 209082	2014-05-17 21:50:17 +00:00
Pete Cooper	fa13048706	Use a sized enum for MachineOperandType. No functionality change llvm-svn: 209048	2014-05-16 23:28:17 +00:00
Rafael Espindola	e809bea68e	Delete getAliasedGlobal. llvm-svn: 209040	2014-05-16 22:37:03 +00:00
Jay Foad	e0eac700cb	Rename ComputeMaskedBits to computeKnownBits. "Masked" has been inappropriate since it lost its Mask parameter in r154011. llvm-svn: 208811	2014-05-14 21:14:37 +00:00
Eric Christopher	935299458d	Fix typo in function name. llvm-svn: 208743	2014-05-14 00:31:15 +00:00
Eric Christopher	1091ab4275	Save the optimization level the subtarget was created with in a member variable and sink the initialization of crbits into the subtarget feature reset code. No functional change, but this refactor will be used in a future commit. llvm-svn: 208726	2014-05-13 20:49:08 +00:00
Hal Finkel	f6f53bcd51	[PowerPC] Add global named register support Support for the intrinsics that read from and write to global named registers is added for r1, r2 and r13 (depending on the subtarget). llvm-svn: 208509	2014-05-11 19:29:11 +00:00
Hal Finkel	ef707e1c3e	[PowerPC] On PPC32, 128-bit shifts might be runtime calls The counter-loops formation pass needs to know what operations might be function calls (because they can't appear in counter-based loops). On PPC32, 128-bit shifts might be runtime calls (even though you can't use __int128 on PPC32, it seems that SROA might form them). Fixes PR19709. llvm-svn: 208501	2014-05-11 16:23:29 +00:00
Rafael Espindola	765e5e78cf	Remove the UseCFI option from createAsmStreamer. We were already always passing true, this just removes the option. llvm-svn: 208205	2014-05-07 13:00:43 +00:00
Rafael Espindola	bb77d317cd	Fix pr19645. The fix itself is fairly simple: move getAccessVariant to MCValue so that we replace the old weak expression evaluation with the far more general EvaluateAsRelocatable. This then requires that EvaluateAsRelocatable stop when it finds a non trivial reference kind. And that in turn requires the ELF writer to look harder for weak references. Last but not least, this found a case where we were being bug by bug compatible with gas and accepting an invalid input. I reported pr19647 to track it. llvm-svn: 207920	2014-05-03 19:57:04 +00:00
Craig Topper	79b097d66a	Use makeArrayRef insted of calling ArrayRef<T> constructor directly. I introduced most of these recently. llvm-svn: 207616	2014-04-30 07:17:30 +00:00
Craig Topper	5f5f448d02	De-virtualize or remove some methods that have no overrides nor override anything. In some cases remove all together if there are no callers either. llvm-svn: 207610	2014-04-30 05:53:27 +00:00
Craig Topper	dcce1d897e	[C++11] Add 'override' keywords and remove 'virtual'. Additionally add 'final' and leave 'virtual' on some methods that are marked virtual without overriding anything and have no obvious overrides themselves. PowerPC edition llvm-svn: 207504	2014-04-29 07:57:37 +00:00
Eric Christopher	4af26f41d6	None of these targets actually define their own CFI_INSTRUCTION opcode so there's no reason to use the target namespace for it rather than TargetOpcode. llvm-svn: 207475	2014-04-29 00:16:46 +00:00
Eric Christopher	17d4cf2e96	80-column, tab characters, comment fixups. llvm-svn: 207473	2014-04-29 00:16:40 +00:00
Craig Topper	9683cb114b	Convert more SelectionDAG functions to use ArrayRef. llvm-svn: 207397	2014-04-28 05:57:50 +00:00
Craig Topper	b663bffa27	[C++] Use 'nullptr'. llvm-svn: 207394	2014-04-28 04:05:08 +00:00
Craig Topper	1efda44640	Convert SelectionDAG::SelectNodeTo to use ArrayRef. llvm-svn: 207377	2014-04-27 19:21:11 +00:00
Craig Topper	536995c0a7	Convert SelectionDAG::getMergeValues to use ArrayRef. llvm-svn: 207374	2014-04-27 19:20:57 +00:00

... 5 6 7 8 9 ...

4513 Commits