llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-26 14:33:02 +02:00

Author	SHA1	Message	Date
NAKAMURA Takumi	4599dee67a	llvm/lib: [CMake] Add explicit dependency to intrinsics_gen. llvm-svn: 159112	2012-06-24 13:32:01 +00:00
Craig Topper	3f4f2125fc	Silence an unused variable warning on release builds. llvm-svn: 159074	2012-06-23 08:09:30 +00:00
Hal Finkel	ebe9ea8bd7	Add support for the PPC isel instruction. The isel (integer select) instruction is supported on the 440 and A2 embedded cores and on the POWER7. llvm-svn: 159045	2012-06-22 23:10:08 +00:00
Hal Finkel	2eb4a5326e	Convert the PPC backend to use the new FMA infrastructure. The existing contraction patterns are replaced with fma/fneg. Overall functionality should be the same. llvm-svn: 158955	2012-06-22 00:49:52 +00:00
Hal Finkel	bc9be7c0e5	Treat TargetGlobalAddress as a constant for the purpose of matching pre-inc stores on PPC. Thanks to Tobias von Koch for pointing out this problem. llvm-svn: 158932	2012-06-21 20:10:48 +00:00
Hal Finkel	a94da28a6d	Add support for generating reg+reg (indexed) pre-inc loads on PPC. llvm-svn: 158823	2012-06-20 15:43:03 +00:00
Lang Hames	f0b9601a6d	Add DAG-combines for aggressive FMA formation. This patch adds DAG combines to form FMAs from pairs of FADD + FMUL or FSUB + FMUL. The combines are performed when: (a) Either AllowExcessFPPrecision option (-enable-excess-fp-precision for llc) OR UnsafeFPMath option (-enable-unsafe-fp-math) are set, and (b) TargetLoweringInfo::isFMAFasterThanMulAndAdd(VT) is true for the type of the FADD/FSUB, and (c) The FMUL only has one user (the FADD/FSUB). If your target has fast FMA instructions you can make use of these combines by overriding TargetLoweringInfo::isFMAFasterThanMulAndAdd(VT) to return true for types supported by your FMA instruction, and adding patterns to match ISD::FMA to your FMA instructions. llvm-svn: 158757	2012-06-19 22:51:23 +00:00
Jakob Stoklund Olesen	66e7517610	Implement PPCInstrInfo::isCoalescableExtInstr(). The PPC::EXTSW instruction preserves the low 32 bits of its input, just like some of the x86 instructions. Use it to reduce register pressure when the low 32 bits have multiple uses. This requires a small change to PeepholeOptimizer since EXTSW takes a 64-bit input register. This is related to PR5997. llvm-svn: 158743	2012-06-19 21:14:34 +00:00
Hal Finkel	12c1b6478a	Mark most PPC register classes to avoid write-after-write. For processors with the G5-like instruction-grouping scheme, this helps avoid early group termination due to a write-after-write dependency within the group. It should also help on pipelined embedded cores. On POWER7, over the test suite, this gives an average 0.5% speedup. The largest speedups are: SingleSource/Benchmarks/Stanford/Quicksort - 33% MultiSource/Applications/d/make_dparser - 21% MultiSource/Benchmarks/FreeBench/analyzer/analyzer - 12% MultiSource/Benchmarks/MiBench/telecomm-FFT/telecomm-fft - 12% Largest slowdowns: SingleSource/Benchmarks/Stanford/Bubblesort - 23% MultiSource/Benchmarks/Prolangs-C++/city/city - 21% MultiSource/Benchmarks/BitBench/uuencode/uuencode - 16% MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode - 13% llvm-svn: 158719	2012-06-19 13:57:17 +00:00
Hal Finkel	42b797225a	Add support for generating reg+reg preinc stores on PPC. PPC will now generate STWUX and friends. llvm-svn: 158698	2012-06-19 02:34:32 +00:00
Hal Finkel	40483bafbf	Cleanup trip-count finding for PPC CTR loops (and some bug fixes). This cleans up the method used to find trip counts in order to form CTR loops on PPC. This refactoring allows the pass to find loops which have a constant trip count but also happen to end with a comparison to zero. This also adds explicit FIXMEs to mark two different classes of loops that are currently ignored. In addition, we now search through all potential induction operations instead of just the first. Also, we check the predicate code on the conditional branch and abort the transformation if the code is not EQ or NE, and we then make sure that the branch to be transformed matches the condition register defined by the comparison (multiple possible comparisons will be considered). llvm-svn: 158607	2012-06-16 20:34:07 +00:00
Hal Finkel	021ce0f07b	Add another missing 64-bit itinerary definition for the PPC A2 core. llvm-svn: 158393	2012-06-13 05:55:09 +00:00
Hal Finkel	ad66e9569f	Add some missing 64-bit itinerary definitions for the PPC A2 core. llvm-svn: 158373	2012-06-12 20:32:29 +00:00
Hal Finkel	a005be7ae7	Split out the PPC instruction class IntSimple from IntGeneral. On the POWER7, adds and logical operations can also be handled in the load/store pipelines. We'll call these IntSimple. llvm-svn: 158366	2012-06-12 19:01:24 +00:00
Hal Finkel	6a80441b25	Fixes for PPC host detection and features. POWER4 is a 64-bit CPU (better matched to the 970). The g3 is really the 750 (no altivec), the g4+ is the 74xx (not the 750). Patch by Andreas Tobler. llvm-svn: 158363	2012-06-12 16:39:23 +00:00
Hal Finkel	f29a217a58	Reapply r158337, this time properly protect Darwin/PPC host CPU use with __ppc__. Original commit message: Move PPC host-CPU detection logic from PPCSubtarget into sys::getHostCPUName(). Both the new Linux functionality and the old Darwin functions have been moved. This change also allows this information to be queried directly by clang and other frontends (clang, for example, will now have real -mcpu=native support). llvm-svn: 158349	2012-06-12 03:03:13 +00:00
Jakob Stoklund Olesen	dd5f904ac0	Revert r158337 "Move PPC host-CPU detection logic from PPCSubtarget into sys::getHostCPUName()." This commit broke most of the PowerPC unit tests when running on Intel/Apple. llvm-svn: 158345	2012-06-12 00:58:40 +00:00
Hal Finkel	d1e6c8928a	Move PPC host-CPU detection logic from PPCSubtarget into sys::getHostCPUName(). Both the new Linux functionality and the old Darwin functions have been moved. This change also allows this information to be queried directly by clang and other frontends (clang, for example, will now have real -mcpu=native support). llvm-svn: 158337	2012-06-11 23:14:31 +00:00
Hal Finkel	6b88059d5e	Enable MFOCRF generation on the PPC A2 core. llvm-svn: 158324	2012-06-11 19:57:04 +00:00
Hal Finkel	fa52a34d78	Rename the PPC target feature gpul to mfocrf. The PPC target feature gpul (IsGigaProcessor) was only used for one thing: To enable the generation of the MFOCRF instruction. Furthermore, this instruction is available on other PPC cores outside of the G5 line. This feature now corresponds to the HasMFOCRF flag. No functionality change. llvm-svn: 158323	2012-06-11 19:57:01 +00:00
Hal Finkel	9d5f414e63	Add A2 to the list of PPC CPUs recognized by Linux host CPU-type detection. llvm-svn: 158322	2012-06-11 19:56:57 +00:00
Hal Finkel	93a1ca077d	Emit the two-operand form of the PPC mfcr instruction as mfocrf. This is necessary on Linux and supported on Darwin, see PR2604. llvm-svn: 158315	2012-06-11 15:43:15 +00:00
Hal Finkel	90c0af7137	Add local CPU detection for Linux PPC. This functionality mirrors that available on PPC/Darwin. llvm-svn: 158314	2012-06-11 15:43:13 +00:00
Hal Finkel	ab5027a1a2	Add POWER6 and POWER7 CPU types to the PPC backend. No functional change; these will be used by upcoming scheduler enhancements. llvm-svn: 158313	2012-06-11 15:43:08 +00:00
Hal Finkel	b6ac451381	Enable ILP scheduling for all nodes by default on PPC. Over the entire test-suite, this has an insignificantly negative average performance impact, but reduces some of the worst slowdowns from the anti-dep. change (r158294). Largest speedups: SingleSource/Benchmarks/Stanford/Quicksort - 28% SingleSource/Benchmarks/Stanford/Towers - 24% SingleSource/Benchmarks/Shootout-C++/matrix - 23% MultiSource/Benchmarks/SciMark2-C/scimark2 - 19% MultiSource/Benchmarks/MiBench/automotive-bitcount/automotive-bitcount - 15% (matrix and automotive-bitcount were both in the top-5 slowdown list from the anti-dep. change) Largest slowdowns: MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 28% MultiSource/Benchmarks/mediabench/gsm/toast/toast - 26% MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan - 21% SingleSource/Benchmarks/CoyoteBench/lpbench - 20% MultiSource/Applications/d/make_dparser - 16% llvm-svn: 158296	2012-06-10 19:32:29 +00:00
Hal Finkel	6416ab5c36	Use critical anti-dep. breaking on all PPC targets, but also add other register classes. Using 'all' instead of 'critical' would be better because it would make it easier to satisfy the bundling constraints, but, as noted in the FIXME, that is currently not possible with the crs. This yields an average 1% speedup over the entire test suite (on Power 7). Largest speedups: SingleSource/Benchmarks/Shootout-C++/moments - 40% MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 28% SingleSource/Benchmarks/BenchmarkGame/nsieve-bits - 26% SingleSource/Benchmarks/McGill/misr - 23% MultiSource/Applications/JM/ldecod/ldecod - 22% Largest slowdowns: SingleSource/Benchmarks/Shootout-C++/matrix - -29% SingleSource/Benchmarks/Shootout-C++/ary3 - -22% MultiSource/Benchmarks/BitBench/uuencode/uuencode - -18% SingleSource/Benchmarks/Shootout-C++/ary - -17% MultiSource/Benchmarks/MiBench/automotive-bitcount/automotive-bitcount - -15% llvm-svn: 158294	2012-06-10 11:15:36 +00:00
Hal Finkel	a9b329fcf1	Improve ext/trunc patterns on PPC64. The PPC64 backend had patterns for i32 <-> i64 extensions and truncations that would leave self-moves in the final assembly. Replacing those patterns with ones based on the SUBREG builtins yields better-looking code. Thanks to Jakob and Owen for their suggestions in this matter. llvm-svn: 158283	2012-06-09 22:10:19 +00:00
Hal Finkel	d2d71dd821	Enable tail merging on PPC. Tail merging had been disabled on PPC because it would disturb bundling decisions made during pre-RA scheduling on the 970 cores. Now, however, all bundling decisions are made during post-RA scheduling, and tail merging is generally beneficial (the average test-suite speedup is insignificantly positive). Largest test-suite speedups: MultiSource/Benchmarks/mediabench/gsm/toast/toast - 30% MultiSource/Benchmarks/BitBench/uuencode/uuencode - 23% SingleSource/Benchmarks/Shootout-C++/ary - 21% SingleSource/Benchmarks/Stanford/Queens - 17% Largest slowdowns: MultiSource/Benchmarks/MiBench/security-sha/security-sha - 24% MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 22% MultiSource/Applications/JM/ldecod/ldecod - 14% MultiSource/Benchmarks/mediabench/g721/g721encode/encode - 9% This is improved by using full (instead of just critical) anti-dependency breaking, but doing so still causes miscompiles and so cannot yet be enabled by default. llvm-svn: 158259	2012-06-09 03:14:50 +00:00
Hal Finkel	96aa8d0716	Remove the TODO statement in the PPC README re: CTR loops As Chris points out, this can now be removed! TODO: check if the associated section on viterbi's inner loop can also be removed. llvm-svn: 158224	2012-06-08 20:02:09 +00:00
Hal Finkel	1424f01791	Enable PPC CTR loop formation by default. Thanks to Jakob's help, this now causes no new test suite failures! Over the entire test suite, this gives an average 1% speedup. The largest speedups are: SingleSource/Benchmarks/Misc/pi - 108% SingleSource/Benchmarks/CoyoteBench/lpbench - 54% MultiSource/Benchmarks/Prolangs-C/unix-smail/unix-smail - 50% SingleSource/Benchmarks/Shootout/ary3 - 32% SingleSource/Benchmarks/Shootout-C++/matrix - 30% The largest slowdowns are: MultiSource/Benchmarks/mediabench/gsm/toast/toast - -30% MultiSource/Benchmarks/Prolangs-C/bison/mybison - -25% MultiSource/Benchmarks/BitBench/uuencode/uuencode - -22% MultiSource/Applications/d/make_dparser - -14% SingleSource/Benchmarks/Shootout-C++/ary - -13% In light of these slowdowns, additional profiling work is obviously needed! llvm-svn: 158223	2012-06-08 19:19:53 +00:00
Hal Finkel	44387d4528	Mark the PPC CTRRC and CTRRC8 register classes as non-allocatable. Marking these classes as non-alocatable allows CTR loop generation to work correctly with the block placement passes, etc. These register classes are currently used only by some unused TCRETURN patterns. In future cleanup, these will be removed. Thanks again to Jakob for suggesting this fix to the CTR loop problem! llvm-svn: 158221	2012-06-08 19:02:08 +00:00
Hal Finkel	d05ff520b8	Disable the PPC CTR-Loops pass by default. The pass itself works well, but the something in the Machine* infrastructure does not understand terminators which define registers. Without the ability to use the block-placement pass, etc. this causes performance regressions (and so is turned off by default). Turning off the analysis turns off the problems with the Machine* infrastructure. llvm-svn: 158206	2012-06-08 15:38:25 +00:00
Hal Finkel	a6629c556e	Fix a bug in the new PPC CTR-Loops pass. The code which tests for an induction operation cannot assume that any ADDI instruction will have a register operand because the operand could also be a frame index; for example: %vreg16<def> = ADDI8 <fi#0>, 0; G8RC:%vreg16 llvm-svn: 158205	2012-06-08 15:38:23 +00:00
Hal Finkel	bb4e499e94	Add the PPCCTRLoops pass: a PPC machine-code-level optimization pass to form CTR-based loop branching code. This pass is derived from the Hexagon HardwareLoops pass. The only significant enhancement over the Hexagon pass is that PPCCTRLoops will also attempt to delete the replaced add and compare operations if they are no longer otherwise used. Also, invalid preheader DebugLoc is not used. llvm-svn: 158204	2012-06-08 15:38:21 +00:00
Roman Divacky	f8e2e4beaa	PPC32 uses R2 as the TLS register. Fix the copy and paste. llvm-svn: 158004	2012-06-05 17:14:17 +00:00
Roman Divacky	0daa2c0556	Implement local-exec TLS on PowerPC. llvm-svn: 157935	2012-06-04 17:36:38 +00:00
Hal Finkel	c1daf7daf8	Fix a copy-and-paste duplication error in the PPC 440 and A2 schedules (no functionality change). llvm-svn: 157912	2012-06-04 02:39:52 +00:00
Hal Finkel	c1fe73fae2	Enable generating PPC pre-increment (r+imm) instructions by default. It seems that this no longer causes test suite failures on PPC64 (after r157159), and often gives a performance benefit, so it can be enabled by default. llvm-svn: 157911	2012-06-04 02:21:00 +00:00
Justin Holewinski	77c4679dae	Change interface for TargetLowering::LowerCallTo and TargetLowering::LowerCall to pass around a struct instead of a large set of individual values. This cleans up the interface and allows more information to be added to the struct for future targets without requiring changes to each and every target. NV_CONTRIB llvm-svn: 157479	2012-05-25 16:35:28 +00:00
Hal Finkel	9fad4cf803	Add a missing PPC 64-bit stwu pattern. This seems to fix the remaining compile-time failures on PPC64 when compiling with -enable-ppc-preinc. llvm-svn: 157159	2012-05-20 17:11:24 +00:00
Hal Finkel	259ded3876	Add a FIXME about access to negative stack-pointer offsets on PPC32. The current code will generate a prologue which starts with something like: mflr 0 stw 31, -4(1) stw 0, 4(1) stwu 1, -16(1) But under the PPC32 SVR4 ABI, access to negative offsets from R1 is not allowed. This was pointed out by Peter Bergner. llvm-svn: 157133	2012-05-19 21:52:55 +00:00
Jim Grosbach	2e62e2f664	Allow MCCodeEmitter access to the target MCRegisterInfo. Add the MCRegisterInfo to the factories and constructors. Patch by Tom Stellard <Tom.Stellard@amd.com>. llvm-svn: 156828	2012-05-15 17:35:52 +00:00
Roman Divacky	6c6b0716b9	Mark .opd @progbits, thus avoiding a warning from asm. llvm-svn: 156494	2012-05-09 18:24:23 +00:00
Jakob Stoklund Olesen	cc0cf22b98	Add an MF argument to TRI::getPointerRegClass() and TII::getRegClass(). The getPointerRegClass() hook can return register classes that depend on the calling convention of the current function (ptr_rc_tailcall). So far, we have been able to infer the calling convention from the subtarget alone, but as we add support for multiple calling conventions per target, that no longer works. Patch by Yiannis Tsiouris! llvm-svn: 156328	2012-05-07 22:10:26 +00:00
Jakob Stoklund Olesen	7bdae32bfd	Remove the SubRegClasses field from RegisterClass descriptions. This information in now computed by TableGen. llvm-svn: 156152	2012-05-04 03:30:34 +00:00
Bill Wendling	003b1bf46c	Change the PassManager from a reference to a pointer. The TargetPassManager's default constructor wants to initialize the PassManager to 'null'. But it's illegal to bind a null reference to a null l-value. Make the ivar a pointer instead. PR12468 llvm-svn: 155902	2012-05-01 08:27:43 +00:00
Preston Gurd	0a730de3c3	This patch fixes a problem which arose when using the Post-RA scheduler on X86 Atom. Some of our tests failed because the tail merging part of the BranchFolding pass was creating new basic blocks which did not contain live-in information. When the anti-dependency code in the Post-RA scheduler ran, it would sometimes rename the register containing the function return value because the fact that the return value was live-in to the subsequent block had been lost. To fix this, it is necessary to run the RegisterScavenging code in the BranchFolding pass. This patch makes sure that the register scavenging code is invoked in the X86 subtarget only when post-RA scheduling is being done. Post RA scheduling in the X86 subtarget is only done for Atom. This patch adds a new function to the TargetRegisterClass to control whether or not live-ins should be preserved during branch folding. This is necessary in order for the anti-dependency optimizations done during the PostRASchedulerList pass to work properly when doing Post-RA scheduling for the X86 in general and for the Intel Atom in particular. The patch adds and invokes the new function trackLivenessAfterRegAlloc() instead of using the existing requiresRegisterScavenging(). It changes BranchFolding.cpp to call trackLivenessAfterRegAlloc() instead of requiresRegisterScavenging(). It changes the all the targets that implemented requiresRegisterScavenging() to also implement trackLivenessAfterRegAlloc(). It adds an assertion in the Post RA scheduler to make sure that post RA liveness information is available when it is needed. It changes the X86 break-anti-dependencies test to use –mcpu=atom, in order to avoid running into the added assertion. Finally, this patch restores the use of anti-dependency checking (which was turned off temporarily for the 3.1 release) for Intel Atom in the Post RA scheduler. Patch by Andy Zhang! Thanks to Jakob and Anton for their reviews. llvm-svn: 155395	2012-04-23 21:39:35 +00:00
Gabor Greif	f1b29d4778	effectively back out my last change (r155190) llvm-svn: 155195	2012-04-20 11:41:38 +00:00
Gabor Greif	42a6b79fea	fix obviously bogus (IMO) operand index of the load in asserts (load only has one operand) and smuggle in some whitespace changes too NB: I am obviously testing the water here, and believe that the unguarded cast is still wrong, but why is the getZExtValue of the load's operand tested against zero here? Any review is appreciated. llvm-svn: 155190	2012-04-20 08:58:49 +00:00
Craig Topper	a0bf6c3af3	Convert some uses of XXXRegisterClass to &XXXRegClass. No functional change since they are equivalent. llvm-svn: 155186	2012-04-20 06:31:50 +00:00

1 2 3 4 5 ...

3119 Commits