llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-26 06:22:56 +02:00

Author	SHA1	Message	Date
Hal Finkel	9af1fa0d50	Cleanup and improve PPC fsel generation First, we should not cheat: fsel-based lowering of select_cc is a finite-math-only optimization (the ISA manual, section F.3 of v2.06, makes this clear, as does a note in our own README). This also adds fsel-based lowering of EQ and NE condition codes. As it turned out, fsel generation was covered by a grand total of zero regression test cases. I've added some test cases to cover the existing behavior (which is now finite-math only), as well as the new EQ cases. llvm-svn: 179000	2013-04-07 22:11:09 +00:00
Hal Finkel	b556d850c9	Enable early if conversion on PPC On cores for which we know the misprediction penalty, and we have the isel instruction, we can profitably perform early if conversion. This enables us to replace some small branch sequences with selects and avoid the potential stalls from mispredicting the branches. Enabling this feature required implementing canInsertSelect and insertSelect in PPCInstrInfo; isel code in PPCISelLowering was refactored to use these functions as well. llvm-svn: 178926	2013-04-05 23:29:01 +00:00
Hal Finkel	02fd9b0859	Rename the current PPC BCL definition to BCLalways BCL is normally a conditional branch-and-link instruction, but has an unconditional form (which is used in the SjLj code, for example). To make clear that this BCL instruction definition is specifically the special unconditional form (which does not meaningfully take a condition-register input), rename it to BCLalways. No functionality change intended. llvm-svn: 178803	2013-04-04 22:55:54 +00:00
Hal Finkel	1dc78e666b	PPC: Improve code generation for mixed-precision reciprocal sqrt The DAGCombine logic that recognized a/sqrt(b) and transformed it into a multiplication by the reciprocal sqrt did not handle cases where the sqrt and the division were separated by an fpext or fptrunc. llvm-svn: 178801	2013-04-04 22:44:12 +00:00
Hal Finkel	3e38cb94ec	Cleanup PPC reciprocal-estimate functionality Incorporating review feedback from Bill Schmidt on r178617. No functionality change intended. llvm-svn: 178672	2013-04-03 17:44:56 +00:00
Bill Schmidt	990515e4c4	Fix PR15632: No support for ppcf128 floating-point remainder on PowerPC. For this we need to use a libcall. Previously LLVM didn't implement libcall support for frem, so I've added it in the usual straightforward manner. A test case from the bug report is included. llvm-svn: 178639	2013-04-03 13:05:44 +00:00
Hal Finkel	0208f7c744	Use PPC reciprocal estimates with Newton iteration in fast-math mode When unsafe FP math operations are enabled, we can use the fre[s] and frsqrte[s] instructions, which generate reciprocal (sqrt) estimates, together with some Newton iteration, in order to quickly generate floating-point division and sqrt results. All of these instructions are separately optional, and so each has its own feature flag (except for the Altivec instructions, which are covered under the existing Altivec flag). Doing this is not only faster than using the IEEE-compliant fdiv/fsqrt instructions, but allows these computations to be pipelined with other computations in order to hide their overall latency. I've also added a couple of missing fnmsub patterns which turned out to be missing (but are necessary for good code generation of the Newton iterations). Altivec needs a similar fix, but that will probably be more complicated because fneg is expanded for Altivec's v4f32. llvm-svn: 178617	2013-04-03 04:01:11 +00:00
Bill Schmidt	c98ed219d3	Fix PR15630: Replace faulty stdcx. with stwcx. When doing a partword atomic operation, a lwarx was being paired with a stdcx. instead of a stwcx. when compiling for a 64-bit target. The target has nothing to do with it in this case; we always need a stwcx. Thanks to Kai Nacke for reporting the problem. llvm-svn: 178559	2013-04-02 18:37:08 +00:00
Hal Finkel	1adfd9c87a	Fix typo in PPCISelLowering Thanks to Bill Schmidt for finding this in review of r178480. llvm-svn: 178521	2013-04-02 03:29:51 +00:00
Hal Finkel	9191cdb5f2	Fix a bad assert in PPCTargetLowering llvm-svn: 178489	2013-04-01 18:42:58 +00:00
Hal Finkel	f184647a53	Add more PPC floating-point conversion instructions The P7 and A2 have additional floating-point conversion instructions which allow a direct two-instruction sequence (plus load/store) to convert from all combinations (signed/unsigned i32/i64) <--> (float/double) (on previous cores, only some combinations were directly available). llvm-svn: 178480	2013-04-01 17:52:07 +00:00
Hal Finkel	9eed3ac928	Add the PPC popcntw instruction The popcntw instruction is available whenever the popcntd instruction is available, and performs a separate popcnt on the lower and upper 32-bits. Ignoring the high-order count, this can be used for the 32-bit input case (saving on the explicit zero extension otherwise required to use popcntd). llvm-svn: 178470	2013-04-01 15:58:15 +00:00
Hal Finkel	91e522ef96	Treat PPCISD::STFIWX like the memory opcode that it is PPCISD::STFIWX is really a memory opcode, and so it should come after FIRST_TARGET_MEMORY_OPCODE, and we should use DAG.getMemIntrinsicNode to create nodes using it. No functionality change intended (although there could be optimization benefits from preserving the MMO information). llvm-svn: 178468	2013-04-01 15:37:53 +00:00
Hal Finkel	085f61160f	Add the PPC lfiwax instruction This instruction is available on modern PPC64 CPUs, and is now used to improve the SINT_TO_FP lowering (by eliminating the need for the separate sign extension instruction and decreasing the amount of needed stack space). llvm-svn: 178446	2013-03-31 10:12:51 +00:00
Hal Finkel	7bdfbd6570	Cleanup PPC(64) i32 -> float/double conversion The existing SINT_TO_FP code for i32 -> float/double conversion was disabled because it relied on broken EXTSW_32/STD_32 instruction definitions. The original intent had been to enable these 64-bit instructions to be used on CPUs that support them even in 32-bit mode. Unfortunately, this form of lying to the infrastructure was buggy (as explained in the FIXME comment) and had therefore been disabled. This re-enables this functionality, using regular DAG nodes, but only when compiling in 64-bit mode. The old STD_32/EXTSW_32 definitions (which were dead) are removed. llvm-svn: 178438	2013-03-31 01:58:02 +00:00
Hal Finkel	c14a74d243	Implement FRINT lowering on PPC using frin Like nearbyint, rint can be implemented on PPC using the frin instruction. The complication comes from the fact that rint needs to set the FE_INEXACT flag when the result does not equal the input value (and frin does not do that). As a result, we use a custom inserter which, after the rounding, compares the rounded value with the original, and if they differ, explicitly sets the XX bit in the FPSCR register (which corresponds to FE_INEXACT). Once LLVM has better modeling of the floating-point environment we should be able to (often) eliminate this extra complexity. llvm-svn: 178362	2013-03-29 19:41:55 +00:00
Benjamin Kramer	279e5cfa9a	Remove the old CodePlacementOpt pass. It was superseded by MachineBlockPlacement and disabled by default since LLVM 3.1. llvm-svn: 178349	2013-03-29 17:14:24 +00:00
Hal Finkel	3493fd8e51	Add PPC FP rounding instructions fri[mnpz] These instructions are available on the P5x (and later) and on the A2. They implement the standard floating-point rounding operations (floor, trunc, etc.). One caveat: frin (round to nearest) does not implement "ties to even", and so is only enabled in fast-math mode. llvm-svn: 178337	2013-03-29 08:57:48 +00:00
Hal Finkel	88670ad5f4	Only enable 64-bit bswap DAG combines for PPC64 Compiling in 32-bit mode on a P7 would assert after 64-bit DAG combines were added for bswap with load/store. This is because these combines are really only valid in 64-bit mode, regardless of the CPU (and this was not being checked). llvm-svn: 178286	2013-03-28 20:23:46 +00:00
Hal Finkel	367c3c2189	Fix bad indentation in r178276 Thanks to Bill Schmidt for pointing this out! llvm-svn: 178280	2013-03-28 19:43:12 +00:00
Hal Finkel	f359927db6	Add the PPC64 ldbrx/stdbrx instructions These are 64-bit load/store with byte-swap, and available on the P7 and the A2. Like the similar instructions for 16- and 32-bit words, these are matched in the target DAG-combine phase against load/store-bswap pairs. llvm-svn: 178276	2013-03-28 19:25:55 +00:00
Hal Finkel	c21c3cf09e	Add the PPC64 popcntd instruction PPC ISA 2.06 (P7, A2, etc.) has a popcntd instruction. Add this instruction and tell TTI about it so that popcount-loop recognition will know about it. llvm-svn: 178233	2013-03-28 13:29:47 +00:00
Hal Finkel	ba08d9519e	Fix typo (common to both X86 and PPC) Thanks to Bill Schmidt for pointing this out during code review! llvm-svn: 178170	2013-03-27 19:10:42 +00:00
Ulrich Weigand	5c8ca72b53	PowerPC: Simplify FADD in round-to-zero mode. As part of the the sequence generated to implement long double -> int conversions, we need to perform an FADD in round-to-zero mode. This is problematical since the FPSCR is not at all modeled at the SelectionDAG level, and thus there is a risk of getting floating point instructions generated out of sequence with the instructions to modify FPSCR. The current code handles this by somewhat "special" patterns that in part have dummy operands, and/or duplicate existing instructions, making them awkward to handle in the asm parser. This commit changes this by leaving the "FADD in round-to-zero mode" as an atomic operation on the SelectionDAG level, and only split it up into real instructions at the MI level (via custom inserter). Since at this level the FPSCR is modeled (via the "RM" hard register), much of the "special" stuff can just go away, and the resulting patterns can be used by the asm parser. No significant change in generated code expected. llvm-svn: 178006	2013-03-26 10:56:22 +00:00
Ulrich Weigand	7edb530b86	PowerPC: Use CCBITRC operand for ISEL patterns. This commit changes the ISEL patterns to use a CCBITRC operand instead of a "pred" operand. This matches the actual instruction text more directly, and simplifies use of ISEL with the asm parser. In addition, this change allows some simplification of handling the "pred" operand, as this is now only used by BCC. No change in generated code. llvm-svn: 178003	2013-03-26 10:54:54 +00:00
Ulrich Weigand	c069500748	Remove ABI-duplicated call instruction patterns. We currently have a duplicated set of call instruction patterns depending on the ABI to be followed (Darwin vs. Linux). This is a bit odd; while the different ABIs will result in different instruction sequences, the actual instructions themselves ought to be independent of the ABI. And in fact it turns out that the only nontrivial difference between the two sets of patterns is that in the PPC64 Linux ABI, the instruction used for indirect calls is marked to take X11 as extra input register (which is indeed used only with that ABI to hold an incoming environment pointer for nested functions). However, this does not need to be hard-coded at the .td pattern level; instead, the C++ code expanding calls can simply add that use, just like it adds uses for argument registers anyway. No change in generated code expected. llvm-svn: 177735	2013-03-22 15:24:13 +00:00
Ulrich Weigand	c8c3c4dc22	Fix swapped BasePtr and Offset in pre-inc memory addresses. PPCTargetLowering::getPreIndexedAddressParts currently provides the base part of a memory address in the offset result, and the offset part in the base result. That swap is then undone again when an MI instruction is generated (in PPCDAGToDAGISel::Select for loads, and using .md Pat patterns for stores). This patch reverts this double swap, to make common code and back-end be in sync as to which part of the address is base and which is offset. To avoid performance regressions in certain cases, target code now checks whether the choice of base register would be rejected for pre-inc accesses by common code, and attempts to swap base and offset again in such cases. (Overall, this means that now pre-ice accesses are generated more frequently than before.) llvm-svn: 177733	2013-03-22 14:58:48 +00:00
Hal Finkel	598c91ae81	Remove the G8RC_NOX0_and_GPRC_NOR0 PPC register class As Jakob pointed out in his review of r177423, having a shared ZERO register between the 32- and 64-bit register classes causes this odd G8RC_NOX0_and_GPRC_NOR0 class to be created. As recommended, this adds a ZERO8 register which differentiates the 32- and 64-bit zeros. No functionality change intended. llvm-svn: 177683	2013-03-21 23:45:03 +00:00
Hal Finkel	7e324aee83	Implement builtin_{setjmp/longjmp} on PPC This implements SJLJ lowering on PPC, making the Clang functions __builtin_{setjmp/longjmp} functional on PPC platforms. The implementation strategy is similar to that on X86, with the exception that a branch-and-link variant is used to get the right jump address. Credit goes to Bill Schmidt for suggesting the use of the unconditional bcl form (instead of the regular bl instruction) to limit return-address-cache pollution. Benchmarking the speed at -O3 of: static jmp_buf env_sigill; void foo() { __builtin_longjmp(env_sigill,1); } main() { ... for (int i = 0; i < c; ++i) { if (__builtin_setjmp(env_sigill)) { goto done; } else { foo(); } done:; } ... } vs. the same code using the libc setjmp/longjmp functions on a P7 shows that this builtin implementation is ~4x faster with Altivec enabled and ~7.25x faster with Altivec disabled. This comparison is somewhat unfair because the libc version must also save/restore the VSX registers which we don't yet support. llvm-svn: 177666	2013-03-21 21:37:52 +00:00
Hal Finkel	2043b2adae	Correct PPC FRAMEADDR lowering using a pseudo-register The old code used to lower FRAMEADDR tried to replicate the logic in the real frame-lowering code that determines whether or not the frame pointer (r31) will be used. When it seemed as through the frame pointer would not be used, the stack pointer (r1) was used instead. Unfortunately, because the stack size is not yet known, this does not work. Instead, this change introduces new always-reserved pseudo-registers (FP and FP8) that are replaced during prologue insertion with the real frame-pointer register (either r1 or r31). It is important that this intrinsic always return a valid frame address because it is used by Clang to store the frame address as part of code generation for __builtin_setjmp. llvm-svn: 177653	2013-03-21 19:03:19 +00:00
Hal Finkel	08d0f0125c	Prepare to make r0 an allocatable register on PPC Currently the PPC r0 register is unconditionally reserved. There are two reasons for this: 1. r0 is treated specially (as the constant 0) by certain instructions, and so cannot be used with those instructions as a regular register. 2. r0 is used as a temporary register in the CR-register spilling process (where, under some circumstances, we require two GPRs). This change addresses the first reason by introducing a restricted register class (without r0) for use by those instructions that treat r0 specially. These register classes have a new pseudo-register, ZERO, which represents the r0-as-0 use. This has the side benefit of making the existing target code simpler (and easier to understand), and will make it clear to the register allocator that uses of r0 as 0 don't conflict will real uses of the r0 register. Once the CR spilling code is improved, we'll be able to allocate r0. Adding these extra register classes, for some reason unclear to me, causes requests to the target to copy 32-bit registers to 64-bit registers. The resulting code seems correct (and causes no test-suite failures), and the new test case covers this new kind of asymmetric copy. As r0 is still reserved, no functionality change intended. llvm-svn: 177423	2013-03-19 18:51:05 +00:00
Hal Finkel	42f72e7756	Fix PPC unaligned 64-bit loads and stores PPC64 supports unaligned loads and stores of 64-bit values, but in order to use the r+i forms, the offset must be a multiple of 4. Unfortunately, this cannot always be determined by examining the immediate itself because it might be available only via a TOC entry. In order to get around this issue, we additionally predicate the selection of the r+i form on the alignment of the load or store (forcing it to be at least 4 in order to select the r+i form). llvm-svn: 177338	2013-03-18 23:00:58 +00:00
Hal Finkel	a5a86f0a8e	Enable unaligned memory access on PPC for scalar types Unaligned access is supported on PPC for non-vector types, and is generally more efficient than manually expanding the loads and stores. A few of the existing test cases were using expanded unaligned loads and stores to test other features (like load/store with update), and for these test cases, unaligned access remains disabled. llvm-svn: 177160	2013-03-15 15:27:13 +00:00
Benjamin Kramer	bc0e70c415	ArrayRefize some code. No functionality change. llvm-svn: 176648	2013-03-07 20:33:29 +00:00
Eli Bendersky	37f247b8d8	Move the eliminateCallFramePseudoInstr method from TargetRegisterInfo to TargetFrameLowering, where it belongs. Incidentally, this allows us to delete some duplicated (and slightly different!) code in TRI. There are potentially other layering problems that can be cleaned up as a result, or in a similar manner. The refactoring was OK'd by Anton Korobeynikov on llvmdev. Note: this touches the target interfaces, so out-of-tree targets may be affected. llvm-svn: 175788	2013-02-21 20:05:00 +00:00
Jim Grosbach	233487d8a2	Update TargetLowering ivars for name policy. http://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly ivars should be camel-case and start with an upper-case letter. A few in TargetLowering were starting with a lower-case letter. No functional change intended. llvm-svn: 175667	2013-02-20 21:13:59 +00:00
Bill Schmidt	bcb4fa48fa	Additional fixes for bug 15155. This handles the cases where the 6-bit splat element is odd, converting to a three-instruction sequence to add or subtract two splats. With this fix, the XFAIL in test/CodeGen/PowerPC/vec_constants.ll is removed. llvm-svn: 175663	2013-02-20 20:41:42 +00:00
Bill Schmidt	358367c60f	Fix bug 14779 for passing anonymous aggregates [patch by Kai Nacke]. The PPC backend doesn't handle these correctly. This patch uses logic similar to that in the X86 and ARM backends to track these arguments properly. llvm-svn: 175635	2013-02-20 17:31:41 +00:00
Bill Schmidt	93b2fc9f50	Fix PR15155: lost vadd/vsplat optimization. During lowering of a BUILD_VECTOR, we look for opportunities to use a vector splat. When the splatted value fits in 5 signed bits, a single splat does the job. When it doesn't fit in 5 bits but does fit in 6, and is an even value, we can splat on half the value and add the result to itself. This last optimization hasn't been working recently because of improved constant folding. To circumvent this, create a pseudo VADD_SPLAT that can be expanded during instruction selection. llvm-svn: 175632	2013-02-20 15:50:31 +00:00
Bill Schmidt	93695a5520	PPC calling convention cleanup. Most of PPCCallingConv.td is used only by the 32-bit SVR4 ABI. Rename things to clarify this. Also delete some code that's been commented out for a long time. llvm-svn: 174526	2013-02-06 17:33:58 +00:00
Jakob Stoklund Olesen	ec466d3a2c	Move MRI liveouts to PowerPC return instructions. llvm-svn: 174409	2013-02-05 18:12:00 +00:00
Benjamin Kramer	d07b68b101	Disable a couple more vector splat optimizations on PPC. I didn't see those because the test case used "not grep". FileCheck the test and XFAIL it, preserving the old optimization, so this can be fixed eventually. llvm-svn: 174330	2013-02-04 15:52:32 +00:00
Benjamin Kramer	aa2475fd87	SelectionDAG: Teach FoldConstantArithmetic how to deal with vectors. This required disabling a PowerPC optimization that did the following: input: x = BUILD_VECTOR <i32 16, i32 16, i32 16, i32 16> lowered to: tmp = BUILD_VECTOR <i32 8, i32 8, i32 8, i32 8> x = ADD tmp, tmp The add now gets folded immediately and we're back at the BUILD_VECTOR we started from. I don't see a way to fix this currently so I left it disabled for now. Fix some trivially foldable X86 tests too. llvm-svn: 174325	2013-02-04 15:19:18 +00:00
Evan Cheng	2e2cde560f	Teach SDISel to combine fsin / fcos into a fsincos node if the following conditions are met: 1. They share the same operand and are in the same BB. 2. Both outputs are used. 3. The target has a native instruction that maps to ISD::FSINCOS node or the target provides a sincos library call. Implemented the generic optimization in sdisel and enabled it for Mac OSX. Also added an additional optimization for x86_64 Mac OSX by using an alternative entry point __sincos_stret which returns the two results in xmm0 / xmm1. rdar://13087969 PR13204 llvm-svn: 173755	2013-01-29 02:32:37 +00:00
Chandler Carruth	4c1f3c24db	Move all of the header files which are involved in modelling the LLVM IR into their new header subdirectory: include/llvm/IR. This matches the directory structure of lib, and begins to correct a long standing point of file layout clutter in LLVM. There are still more header files to move here, but I wanted to handle them in separate commits to make tracking what files make sense at each layer easier. The only really questionable files here are the target intrinsic tablegen files. But that's a battle I'd rather not fight today. I've updated both CMake and Makefile build systems (I think, and my tests think, but I may have missed something). I've also re-sorted the includes throughout the project. I'll be committing updates to Clang, DragonEgg, and Polly momentarily. llvm-svn: 171366	2013-01-02 11:36:10 +00:00
Bill Wendling	e0920e4122	Remove the Function::getFnAttributes method in favor of using the AttributeSet directly. This is in preparation for removing the use of the 'Attribute' class as a collection of attributes. That will shift to the AttributeSet class instead. llvm-svn: 171253	2012-12-30 10:32:01 +00:00
Hal Finkel	6b98f1baa2	Expand PPC64 atomic load and store Use of store or load with the atomic specifier on 64-bit types would cause instruction-selection failures. As with the 32-bit case, these can use the default expansion in terms of cmp-and-swap. llvm-svn: 171072	2012-12-25 17:22:53 +00:00
Benjamin Kramer	fac7a73ad8	PowerPC: Expand VSELECT nodes. There's probably a better expansion for those nodes than the default for altivec, but this is better than crashing. VSELECTs occur in loop vectorizer output. llvm-svn: 170551	2012-12-19 15:49:14 +00:00
Bill Wendling	56d9c4b832	Rename the 'Attributes' class to 'Attribute'. It's going to represent a single attribute in the future. llvm-svn: 170502	2012-12-19 07:18:57 +00:00
Bill Schmidt	c895316923	This patch improves the 64-bit PowerPC InitialExec TLS support by providing for a wider range of GOT entries that can hold thread-relative offsets. This matches the behavior of GCC, which was not documented in the PPC64 TLS ABI. The ABI will be updated with the new code sequence. Former sequence: ld 9,x@got@tprel(2) add 9,9,x@tls New sequence: addis 9,2,x@got@tprel@ha ld 9,x@got@tprel@l(9) add 9,9,x@tls Note that a linker optimization exists to transform the new sequence into the shorter sequence when appropriate, by replacing the addis with a nop and modifying the base register and relocation type of the ld. llvm-svn: 170209	2012-12-14 17:02:38 +00:00

1 2 3 4 5 ...

738 Commits