llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-31 07:52:55 +01:00

Author	SHA1	Message	Date
Hal Finkel	f972e31b6c	PPC: Allocate RS spill slot for unaligned i64 load/store This fixes another bug found by llvm-stress! If we happen to be doing an i64 load or store into a stack slot that has less than a 4-byte alignment, then the frame-index elimination may need to use an indexed load or store instruction (because the offset may not be a multiple of 4, a requirement of the STD/LD instructions). The extra register needed to hold the offset comes from the register scavenger, and it is possible that the scavenger will need to use an emergency spill slot. As a result, we need to make sure that a spill slot is allocated when doing an i64 load/store into a less-than-4-byte-aligned stack slot. Because test cases for things like this tend to be fairly fragile, I've concatenated a few small bugpoint-reduced test cases together to form the regression test. llvm-svn: 185907	2013-07-09 06:34:51 +00:00
Bill Wendling	f81cb5b9ed	Stop emitting weak symbols into the "coal" sections. The Mach-O linker has been able to support the weak-def bit on any symbol for quite a while now. The compiler however continued to place these symbols into a "coal" section, which required the linker to map them back to the base section name. Replace the sections like this: __TEXT/__textcoal_nt instead use __TEXT/__text __TEXT/__const_coal instead use __TEXT/__const __DATA/__datacoal_nt instead use __DATA/__data <rdar://problem/14265330> llvm-svn: 185872	2013-07-08 21:34:52 +00:00
Ulrich Weigand	cb20efc341	[PowerPC] Always use "assembler dialect" 1 A setting in MCAsmInfo defines the "assembler dialect" to use. This is used by common code to choose between alternatives in a multi-alternative GNU inline asm statement like the following: __asm__ ("{sfe\|subfe} %0,%1,%2" : "=r" (out) : "r" (in1), "r" (in2)); The meaning of these dialects is platform specific, and GCC defines those for PowerPC to use dialect 0 for old-style (POWER) mnemonics and 1 for new-style (PowerPC) mnemonics, like in the example above. To be compatible with inline asm used with GCC, LLVM ought to do the same. Specifically, this means we should always use assembler dialect 1 since old-style mnemonics really aren't supported on any current platform. However, the current LLVM back-end uses: AssemblerDialect = 1; // New-Style mnemonics. in PPCMCAsmInfoDarwin, and AssemblerDialect = 0; // Old-Style mnemonics. in PPCLinuxMCAsmInfo. The Linux setting really isn't correct, we should be using new-style mnemonics everywhere. This is changed by this commit. Unfortunately, the setting of this variable is overloaded in the back-end to decide whether or not we are on a Darwin target. This is done in PPCInstPrinter (the "SyntaxVariant" is initialized from the MCAsmInfo AssemblerDialect setting), and also in PPCMCExpr. Setting AssemblerDialect to 1 for both Darwin and Linux no longer allows us to make this distinction. Instead, this patch uses the MCSubtargetInfo passed to createPPCMCInstPrinter to distinguish Darwin targets, and ignores the SyntaxVariant parameter. As to PPCMCExpr, this patch adds an explicit isDarwin argument that needs to be passed in by the caller when creating a target MCExpr. (To do so this patch implicitly also reverts commit 184441.) llvm-svn: 185858	2013-07-08 20:20:51 +00:00
Hal Finkel	c4d29e61ee	PPC: Mark vector CC action for SETO and SETONE as Expand Another bug found by llvm-stress! This fixes hitting llvm_unreachable("Invalid integer vector compare condition"); at the end of getVCmpInst in PPCISelDAGToDAG. llvm-svn: 185855	2013-07-08 20:00:03 +00:00
Joey Gouly	b4f59412fd	Add a comment to this change, requested by Eric Christopher. llvm-svn: 185853	2013-07-08 19:52:51 +00:00
Jim Grosbach	b4234c1d88	ARM: Improve codegen for generic vselect. Fall back to by-element insert rather than building it up on the stack. rdar://14351991 llvm-svn: 185846	2013-07-08 18:18:52 +00:00
Hal Finkel	059614de7f	PPC: Mark vector FREM as Expand by default Another bug found by llvm-stress! This fixes crashing with: LLVM ERROR: Cannot select: v4f32 = frem ... llvm-svn: 185840	2013-07-08 17:30:25 +00:00
Bill Schmidt	58913550ff	[PowerPC] Fix PR16556 (handle undef ppcf128 in LowerFP_TO_INT). PPCTargetLowering::LowerFP_TO_INT() expects its source operand to be either an f32 or f64, but this is not checked. A long double (ppcf128) operand will normally be custom-lowered to a conversion to f64 in this context. However, this isn't the case for an UNDEF node. This patch recognizes a ppcf128 as a legal source operand for FP_TO_INT only if it's an undef, in which case it creates an undef of the target type. At some point we might want to do a wholesale custom lowering of ISD::UNDEF when the type is ppcf128, but it's not really clear that's a great idea, and probably more work than it's worth for a situation that only arises in the case of a programming error. At this point I think simple is best. The test case comes from PR16556, and is a crash-test only. llvm-svn: 185821	2013-07-08 14:22:45 +00:00
Nico Rieck	7230f0d23f	Reuse %rax after calling __chkstk on win64 Reapply this as I reverted the wrong commit. llvm-svn: 185807	2013-07-08 11:20:11 +00:00
Nico Rieck	90555b76a0	Revert "Proper va_arg/va_copy lowering on win64" This reverts commit 2b52880592a525cfe04d8f9008a35da8c2ea94c3. Needs review. llvm-svn: 185806	2013-07-08 11:19:44 +00:00
Richard Sandiford	537b8d7bec	[SystemZ] Use MVC for memcpy Use MVC for memcpy in cases where a single MVC is enough. Using MVC is a win for longer copies too, but I'll leave that for later. llvm-svn: 185802	2013-07-08 09:35:23 +00:00
Hal Finkel	b21ca286dc	Fix PromoteIntRes_BUILD_VECTOR crash with i1 vectors This fixes a bug (found by llvm-stress) in DAGTypeLegalizer::PromoteIntRes_BUILD_VECTOR where it assumed that the result type would always be larger than the original operands. This is not always true, however, with boolean vectors. For example, promoting a node of type v8i1 (where the operands will be of type i32, the type to which i1 is promoted) will yield a node with a result vector element type of i16 (and operands of type i32). As a result, we cannot blindly assume that we can ANY_EXTEND the operands to the result type. llvm-svn: 185794	2013-07-08 06:16:58 +00:00
Nico Rieck	cd7bc94022	Revert "Reuse %rax after calling __chkstk on win64" This reverts commit 01f8d579f7672872324208ac5bc4ac311e81b22e. llvm-svn: 185781	2013-07-08 01:30:57 +00:00
Nico Rieck	15089c31ec	Reuse %rax after calling __chkstk on win64 llvm-svn: 185778	2013-07-07 16:48:39 +00:00
Nico Rieck	f5c31a8456	Proper va_arg/va_copy lowering on win64 llvm-svn: 185763	2013-07-06 18:08:19 +00:00
Benjamin Kramer	5d9e616519	DAGCombiner: Don't drop extension behavior when shrinking a load when unsafe. ReduceLoadWidth unconditionally drops extensions from loads. Limit it to the case when all of the bits the extension would otherwise produce are dropped by the shrink. It would be possible to shrink the load in more cases by merging the extensions, but this isn't trivial and a very rare case. I left a TODO for that case. Fixes PR16551. llvm-svn: 185755	2013-07-06 14:05:09 +00:00
Tim Northover	696f647891	Stop putting operations after a tail call. This prevents the emission of DAG-generated vreg definitions after a tail call be dropping them entirely (on the grounds that nothing could use them anyway, and they interfere with O0 CodeGen). llvm-svn: 185754	2013-07-06 12:58:45 +00:00
Arnold Schwaighofer	97cea9b991	ARM: Add a pack pattern for matching arithmetic shift right llvm-svn: 185714	2013-07-05 18:57:49 +00:00
Arnold Schwaighofer	d5fc888196	ARM: Fix incorrect pack pattern A "pkhtb x, x, y asr #num" uses the lower 16 bits of "y asr #num" and packs them in the bottom half of "x". An arithmetic and logic shift are only equivalent in this context if the shift amount is 16. We would be shifting in ones into the bottom 16bits instead of zeros if "y" is negative. radar://14338767 llvm-svn: 185712	2013-07-05 18:28:39 +00:00
Richard Sandiford	c0fe83c1b6	[SystemZ] Remove no-op MVCs The stack coloring pass has code to delete stores and loads that become trivially dead after coloring. Extend it to cope with single instructions that copy from one frame index to another. The testcase happens to show an example of this kicking in at the moment. It did occur in Real Code too though. llvm-svn: 185705	2013-07-05 14:38:48 +00:00
Richard Sandiford	8e414e2aef	Fix double renaming bug in stack coloring pass The stack coloring pass renumbered frame indexes with a loop of the form: for each frame index FI for each instruction I that uses FI for each use of FI in I rename FI to FI' This caused problems if an instruction used two frame indexes F0 and F1 and if F0 was renamed to F1 and F1 to F2. The first time we visited the instruction we changed F0 to F1, then we changed both F1s to F2. In other words, the problem was that SSRefs recorded which instructions used an FI, but not which MachineOperands and MachineMemOperands within that instruction used it. This is easily fixed for MachineOperands by walking the instructions once and processing each operand in turn. There's already a loop to do that for dead store elimination, so it seemed more efficient to fuse the two at the block level. MachineMemOperands are more tricky because they can be shared between instructions. The patch handles them by making SSRefs an array of MachineMemOperands rather than an array of MachineInstrs. We might end up processing the same MachineMemOperand twice, but that's OK because we always know from the SSRefs index what the original frame index was. llvm-svn: 185703	2013-07-05 14:24:47 +00:00
Richard Sandiford	d84ed5f34a	[SystemZ] Enable the use of MVC for frame-to-frame spills ...now that the problem that prompted the restriction has been fixed. The original spill-02.py was a compromise because at the time I couldn't find an example that actually failed without the two scavenging slots. The version included here did. llvm-svn: 185701	2013-07-05 14:02:01 +00:00
Richard Sandiford	acd92ea1e1	[SystemZ] Allocate a second register scavenging slot This is another prerequisite for frame-to-frame MVC copies. I'll commit the patch that makes use of the slot separately. The downside of trying to test many corner cases with each of the available addressing modes is that a fair few tests need to account for the new frame layout. I do still think it's useful to have all these tests though, since it's something that wouldn't get much coverage otherwise. llvm-svn: 185698	2013-07-05 13:11:52 +00:00
Joey Gouly	76f34b0ffb	PR16490: fix a crash in ARMDAGToDAGISel::SelectInlineAsm. In the SelectionDAG immediate operands to inline asm are constructed as two separate operands. The first is a constant of value InlineAsm::Kind_Imm and the second is a constant with the value of the immediate. In ARMDAGToDAGISel::SelectInlineAsm, if we reach an operand of Kind_Imm we should skip over the next operand too. llvm-svn: 185688	2013-07-05 10:19:40 +00:00
Quentin Colombet	49190aa8d1	[ARM] Improve the instruction selection of vector loads. In the ARM back-end, build_vector nodes are lowered to a target specific build_vector that uses floating point type. This works well, unless the inserted bitcasts survive until instruction selection. In that case, they incur moves between integer unit and floating point unit that may result in inefficient code. In other words, this conversion may introduce artificial dependencies when the code leading to the build vector cannot be completed with a floating point type. In particular, this happens when loads are not aligned. Before this patch, in that case, the compiler generates general purpose loads and creates the floating point vector from them, instead of directly using the vector unit. The patch uses a vector friendly sequence of code when the inserted bitcasts to floating point survived DAGCombine. This is done by a target specific DAGCombine that changes the target specific build_vector into a sequence of insert_vector_elt that get rid of the bitcasts. <rdar://problem/14170854> llvm-svn: 185587	2013-07-03 21:42:57 +00:00
Ulrich Weigand	a5490843a1	[PowerPC] Use mtocrf when available Just as with mfocrf, it is also preferable to use mtocrf instead of mtcrf when only a single CR register is to be written. Current code however always emits mtcrf. This probably does not matter when using an external assembler, since the GNU assembler will in fact automatically replace mtcrf with mtocrf when possible. It does create inefficient code with the integrated assembler, however. To fix this, this patch adds MTOCRF/MTOCRF8 instruction patterns and uses those instead of MTCRF/MTCRF8 everything. Just as done in the MFOCRF patch committed as 185556, these patterns will be converted back to MTCRF if MTOCRF is not available on the machine. As a side effect, this allows to modify the MTCRF pattern to accept the full range of mask operands for the benefit of the asm parser. llvm-svn: 185561	2013-07-03 17:59:07 +00:00
Rafael Espindola	006ee945bb	Prefix failing commands with not to make clear they are expected to fail. llvm-svn: 185554	2013-07-03 16:41:29 +00:00
Rafael Espindola	4af20b9fde	Remove another old test. It was only passing because 'grep andpd' was not finding any andpd, but we don't fail if part of a pipe fails. llvm-svn: 185552	2013-07-03 16:35:26 +00:00
Rafael Espindola	6cccbc7b17	Remove test for the old EH system. It doesn't parse anymore. llvm-svn: 185551	2013-07-03 16:30:01 +00:00
Richard Sandiford	c7495a0fca	[SystemZ] Fold more spills Add a mapping from register-based <INSN>R instructions to the corresponding memory-based <INSN>. Use it to cut down on the number of spill loads. Some instructions extend their operands from smaller fields, so this required a new TSFlags field to say how big the unextended operand is. This optimisation doesn't trigger for C(G)R and CL(G)R because in practice we always combine those instructions with a branch. Adding a test for every other case probably seems excessive, but it did catch a missed optimisation for DSGF (fixed in r185435). llvm-svn: 185529	2013-07-03 10:10:02 +00:00
Tim Northover	f4b07c69d1	ARM: relax the atomic release barrier to "dmb ishst" on Swift Swift cores implement store barriers that are stronger than the ARM specification but weaker than general barriers. They are, in fact, just about enough to provide the ordering needed for atomic operations with release semantics. This patch makes use of that quirk. llvm-svn: 185527	2013-07-03 09:20:36 +00:00
Richard Osborne	207824e7f8	[XCore] Add ISel pattern for LDWCP Patch by Robert Lytton. llvm-svn: 185518	2013-07-03 07:48:50 +00:00
Ulrich Weigand	042ff673b7	[PowerPC] Remove VK_PPC_TLSGD and VK_PPC_TLSLD The PowerPC-specific modifiers VK_PPC_TLSGD and VK_PPC_TLSLD correspond exactly to the generic modifiers VK_TLSGD and VK_TLSLD. This causes some confusion with the asm parser, since VK_PPC_TLSGD is output as @tlsgd, which is then read back in as VK_TLSGD. To avoid this confusion, this patch removes the PowerPC-specific modifiers and uses the generic modifiers throughout. (The only drawback is that the generic modifiers are printed in upper case while the usual convention on PowerPC is to use lower-case modifiers. But this is just a cosmetic issue.) llvm-svn: 185476	2013-07-02 21:29:06 +00:00
Richard Sandiford	750b064fa2	[SystemZ] Use DSGFR over DSGR in more cases Fixes some cases where we were using full 64-bit division for (sdiv i32, i32) and (sdiv i64, i32). The "32" in "SDIVREM32" just refers to the second operand. The first operand of all DIVREMs is a GR128. llvm-svn: 185435	2013-07-02 15:40:22 +00:00
Richard Sandiford	33deb195f9	[SystemZ] Use MVC to spill loads and stores Try to use MVC when spilling the destination of a simple load or the source of a simple store. As explained in the comment, this doesn't yet handle the case where the load or store location is also a frame index, since that could lead to two simultaneous scavenger spills, something the backend can't handle yet. spill-02.py tests that this restriction kicks in, but unfortunately I've not yet found a case that would fail without it. The volatile trick I used for other scavenger tests doesn't work here because we can't use MVC for volatile accesses anyway. I'm planning on relaxing the restriction later, hopefully with a test that does trigger the problem... Tests @f8 and @f9 also showed that L(G)RL and ST(G)RL were wrongly classified as SimpleBDX{Load,Store}. It wouldn't be easy to test for that bug separately, which is why I didn't split out the fix as a separate patch. llvm-svn: 185434	2013-07-02 15:28:56 +00:00
Richard Osborne	ad449c14dd	[XCore] Fix instruction selection for zext, mkmsk instructions. r182680 replaced CountLeadingZeros_32 with a template function countLeadingZeros that relies on using the correct argument type to give the right result. The type passed in the XCore backend after this revision was incorrect in a couple of places. Patch by Robert Lytton. llvm-svn: 185430	2013-07-02 14:46:34 +00:00
Tim Northover	d0b90ac5a7	DAGCombiner: fix use-counting issue when forming zextload DAGCombiner was counting all uses of a load node when considering whether it's worth combining into a zextload. Really, it wants to ignore the chain and just count real uses. rdar://problem/13896307 llvm-svn: 185419	2013-07-02 09:58:53 +00:00
Hal Finkel	4eba1c5685	Cleanup PPC Altivec registers in CSR lists and improve VRSAVE handling There are a couple of (small) related changes here: 1. The printed name of the VRSAVE register has been changed from VRsave to vrsave in order to match the name accepted by GNU binutils. 2. Support for parsing vrsave has been added to the asm parser (it seems that there was no test case specifically covering this code, so I've added one). 3. The list of Altivec registers, which was common to all calling conventions, has been separated out. This allows us to define the base CSR lists, and then lists for each ABI with Altivec included. This allows SjLj, for example, to work correctly on non-Altivec targets without using unnatural definitions of the NoRegs CSR list. 4. VRSAVE is now always reserved on non-Darwin targets and all Altivec registers are reserved when Altivec is disabled. With these changes, it is now possible to compile a function containing __builtin_unwind_init() on Linux/PPC64 with debugging information. This did not work previously because GNU binutils assumes that all .cfi_offset offsets will be 8-byte aligned on PPC64 (and errors out if you provide a non-8-byte-aligned offset). This is not true for the vrsave register, however, because this register is used only on Darwin, GCC does not bother printing a .cfi_offset entry for it (even though there is a slot in the stack frame for it as specified by the ABI). This change allows us to do the same: we will also not print .cfi_offset directives for vrsave. llvm-svn: 185409	2013-07-02 03:39:34 +00:00
Bill Schmidt	4e099704b5	Index: test/CodeGen/PowerPC/reloc-align.ll =================================================================== --- test/CodeGen/PowerPC/reloc-align.ll (revision 0) +++ test/CodeGen/PowerPC/reloc-align.ll (revision 0) @@ -0,0 +1,34 @@ +; RUN: llc -mcpu=pwr7 -O1 < %s \| FileCheck %s + +; This test verifies that the peephole optimization of address accesses +; does not produce a load or store with a relocation that can't be +; satisfied for a given instruction encoding. Reduced from a test supplied +; by Hal Finkel. + +target datalayout = "E-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f128:128:128-v128:128:128-n32:64" +target triple = "powerpc64-unknown-linux-gnu" + +%struct.S1 = type { [8 x i8] } + +@main.l_1554 = internal global { i8, i8, i8, i8, i8, i8, i8, i8 } { i8 -1, i8 -6, i8 57, i8 62, i8 -48, i8 0, i8 58, i8 80 }, align 1 + +; Function Attrs: nounwind readonly +define signext i32 @main() #0 { +entry: + %call = tail call fastcc signext i32 @func_90(%struct.S1* byval bitcast ({ i8, i8, i8, i8, i8, i8, i8, i8 }* @main.l_1554 to %struct.S1)) +; CHECK-NOT: ld {{[0-9]+}}, main.l_1554@toc@l + ret i32 %call +} + +; Function Attrs: nounwind readonly +define internal fastcc signext i32 @func_90(%struct.S1 byval nocapture %p_91) #0 { +entry: + %0 = bitcast %struct.S1* %p_91 to i64* + %bf.load = load i64* %0, align 1 + %bf.shl = shl i64 %bf.load, 26 + %bf.ashr = ashr i64 %bf.shl, 54 + %bf.cast = trunc i64 %bf.ashr to i32 + ret i32 %bf.cast +} + +attributes #0 = { nounwind readonly "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf"="true" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "unsafe-fp-math"="false" "use-soft-float"="false" } Index: lib/Target/PowerPC/PPCAsmPrinter.cpp =================================================================== --- lib/Target/PowerPC/PPCAsmPrinter.cpp (revision 185327) +++ lib/Target/PowerPC/PPCAsmPrinter.cpp (working copy) @@ -679,7 +679,26 @@ void PPCAsmPrinter::EmitInstruction(const MachineI OutStreamer.EmitRawText(StringRef("\tmsync")); return; } + break; + case PPC::LD: + case PPC::STD: + case PPC::LWA: { + // Verify alignment is legal, so we don't create relocations + // that can't be supported. + // FIXME: This test is currently disabled for Darwin. The test + // suite shows a handful of test cases that fail this check for + // Darwin. Those need to be investigated before this sanity test + // can be enabled for those subtargets. + if (!Subtarget.isDarwin()) { + unsigned OpNum = (MI->getOpcode() == PPC::STD) ? 2 : 1; + const MachineOperand &MO = MI->getOperand(OpNum); + if (MO.isGlobal() && MO.getGlobal()->getAlignment() < 4) + llvm_unreachable("Global must be word-aligned for LD, STD, LWA!"); + } + // Now process the instruction normally. + break; } + } LowerPPCMachineInstrToMCInst(MI, TmpInst, this); OutStreamer.EmitInstruction(TmpInst); Index: lib/Target/PowerPC/PPCISelDAGToDAG.cpp =================================================================== --- lib/Target/PowerPC/PPCISelDAGToDAG.cpp (revision 185327) +++ lib/Target/PowerPC/PPCISelDAGToDAG.cpp (working copy) @@ -1530,6 +1530,14 @@ void PPCDAGToDAGISel::PostprocessISelDAG() { if (GlobalAddressSDNode GA = dyn_cast<GlobalAddressSDNode>(ImmOpnd)) { SDLoc dl(GA); const GlobalValue GV = GA->getGlobal(); + // We can't perform this optimization for data whose alignment + // is insufficient for the instruction encoding. + if (GV->getAlignment() < 4 && + (StorageOpcode == PPC::LD \|\| StorageOpcode == PPC::STD \|\| + StorageOpcode == PPC::LWA)) { + DEBUG(dbgs() << "Rejected this candidate for alignment.\n\n"); + continue; + } ImmOpnd = CurDAG->getTargetGlobalAddress(GV, dl, MVT::i64, 0, Flags); } else if (ConstantPoolSDNode CP = dyn_cast<ConstantPoolSDNode>(ImmOpnd)) { llvm-svn: 185380	2013-07-01 20:52:27 +00:00
Akira Hatanaka	a704ad94d7	[mips] Fix test case to check that mips64 instructions are generated. llvm-svn: 185371	2013-07-01 20:18:58 +00:00
Anton Korobeynikov	a6a4ed92ee	Really fix the test. Sorry for the breakage... llvm-svn: 185369	2013-07-01 19:51:36 +00:00
Anton Korobeynikov	8e5dfb5ca4	Fix the test which relies on uncommitted change llvm-svn: 185368	2013-07-01 19:50:31 +00:00
Anton Korobeynikov	f6e50fd093	Add jump tables handling for MSP430. Patch by Job Noorman! llvm-svn: 185364	2013-07-01 19:44:44 +00:00
Cameron Zwarich	7af5158621	Fix PR16508. When phis get lowered, destination copies are inserted using an iterator that is determined once for all phis in the block, which BuildMI interprets as a request to insert an instruction directly before the iterator. In the case of a cyclic phi, source copies may also be inserted directly before this iterator, which can cause source copies to be inserted before destination copies. The fix is to keep an iterator to the last phi and then advance it while lowering each phi in order to insert destination copies directly after the phis. llvm-svn: 185363	2013-07-01 19:42:46 +00:00
Hal Finkel	0b854b8c04	Don't form PPC CTR loops for over-sized exit counts Although you can't generate this from C on PPC64, if you have a loop using a 64-bit counter on PPC32 then you can't form a CTR-based loop for it. This had been cauing the PPCCTRLoops pass to assert. Thanks to Joerg Sonnenberger for providing a test case! llvm-svn: 185361	2013-07-01 19:34:59 +00:00
Tim Northover	c1348880dc	AArch64: correct CodeGen of MOVZ/MOVK combinations. According to the AArch64 ELF specification (4.6.8), it's the assembler's responsibility to make sure the shift amount is correct in relocated MOVZ/MOVK instructions. This wasn't being obeyed by either the MCJIT CodeGen or RuntimeDyldELF (which happened to work out well for JIT tests). This commit should make us compliant in this area. llvm-svn: 185360	2013-07-01 19:23:10 +00:00
Tim Northover	51fd747de9	Revert r185339 (ARM: relax the atomic release barrier to "dmb ishst") Turns out I'd misread the architecture reference manual and thought that was a load/store-store barrier, when it's not. Thanks for pointing it out Eli! llvm-svn: 185356	2013-07-01 18:37:33 +00:00
Tim Northover	25286e5b71	ARM: relax the atomic release barrier to "dmb ishst" I believe the full "dmb ish" barrier is not required to guarantee release semantics for atomic operations. The weaker "dmb ishst" prevents previous operations being reordered with a store executed afterwards, which is enough. A key point to note (fortunately already correct) is that this barrier alone is insufficient for sequential consistency, no matter how liberally placed. llvm-svn: 185339	2013-07-01 14:48:48 +00:00
Justin Holewinski	d88c5d6e19	[NVPTX] Add support for module-scope inline asm Since we were explicitly not calling AsmPrinter::doInitialization, any module-scope inline asm was not being printed. llvm-svn: 185336	2013-07-01 13:00:14 +00:00
Justin Holewinski	89a1f98197	[NVPTX] 64-bit ADDC/ADDE are not legal llvm-svn: 185333	2013-07-01 12:59:04 +00:00
Justin Holewinski	6284a5cea6	[NVPTX] Fix vector loads from parameters that span multiple loads, and fix some typos llvm-svn: 185332	2013-07-01 12:59:01 +00:00
Justin Holewinski	77ba2f5ed9	[NVPTX] Handle signext/zeroext attributes properly Fix a case where we were incorrectly sign-extending a value when we should have been zero-extending the value. Also change some SIGN_EXTEND to ANY_EXTEND because we really dont care and may have more opportunity to fold subexpressions llvm-svn: 185331	2013-07-01 12:58:58 +00:00
Justin Holewinski	46fe052e1f	[NVPTX] Add support for native SIGN_EXTEND_INREG where available llvm-svn: 185330	2013-07-01 12:58:56 +00:00
Justin Holewinski	46d3cad4d8	[NVPTX] Add isel patterns for [reg+offset] form of ldg/ldu. llvm-svn: 185329	2013-07-01 12:58:52 +00:00
Justin Holewinski	c254f0f839	[NVPTX] Make sure we zero out high-order 24 bits for 8-bit load into 32-bit value llvm-svn: 185328	2013-07-01 12:58:48 +00:00
Vincent Lejeune	4cef82fa31	R600: Support schedule and packetization of trans-only inst llvm-svn: 185268	2013-06-29 19:32:43 +00:00
Hal Finkel	055ca2ecc9	PPC: Ignore spill/restore requests for VRSAVE (except on Darwin) This fixes PR16418, which reports that a function calling __builtin_unwind_init() asserts. The cause is that this generates a spill/restore for VRSAVE, and we support that only on Darwin (because VRSAVE is only really used on Darwin). The test case checks only that we don't crash. We can add correctness checks once someone verifies what behavior the function is supposed to have. llvm-svn: 185235	2013-06-28 22:29:56 +00:00
Hal Finkel	49c072c532	Fix CodeGen/PowerPC/stack-protector.ll on OpenBSD On OpenBSD, the stack-smash protection transform uses "__guard_local" and "__stack_smash_handler" instead of "__stack_chk_guard" and "__stack_chk_fail". However, CodeGen/PowerPC/stack-protector.ll doesn't specify a target OS, so on OpenBSD it fails. Add -mtriple=ppc32-unknown-linux to make the test host-OS agnostic. While there, convert to FileCheck. Patch by Matthew Dempsky. llvm-svn: 185206	2013-06-28 20:18:14 +00:00
Hal Finkel	7f9144ae20	Fix a PPC rlwimi instruction-selection bug Under certain (evidently rare) circumstances, this code used to convert OR(a, AND(x, y)) into OR(a, x). This was incorrect. While there, I've added a comment to the code immediately above. llvm-svn: 185201	2013-06-28 20:00:07 +00:00
Lang Hames	e8b20c4f7b	Add missing case to switch statement - DAGTypeLegalizer::ExpandIntegerResult should expand ATOMIC_CMP_SWAP nodes the same way that it does for ATOMIC_SWAP. Since ATOMIC_LOADs on some targets (e.g. older ARM variants) get legalized to ATOMIC_CMP_SWAPs, the missing case had been causing i64 atomic loads to crash during isel. <rdar://problem/14074644> llvm-svn: 185186	2013-06-28 18:36:42 +00:00
Justin Holewinski	434a514175	[NVPTX] Add (1.0 / sqrt(x)) => rsqrt(x) generation when allowable by FP flags llvm-svn: 185178	2013-06-28 17:58:13 +00:00
Justin Holewinski	f17855a9dc	[NVPTX] Calling conventions fix Fix ABI handling for function returning bool -- use st.param.b32 to return the value and use ld.param.b32 in caller to load the return value. llvm-svn: 185177	2013-06-28 17:58:10 +00:00
Justin Holewinski	6feb5e8392	[NVPTX] Add support for cttz/ctlz/ctpop llvm-svn: 185176	2013-06-28 17:58:07 +00:00
Justin Holewinski	d365a376eb	[NVPTX] Clean up comparison/select/convert patterns and factor out PTX instructions from their patterns Test case is no breakage llvm-svn: 185175	2013-06-28 17:58:04 +00:00
Justin Holewinski	0f70140107	[NVPTX] Remove i8 register class. PTX support for i8 (.b8, .u8, .s8) is rather poor and we're better off just ignoring it and letting LLVM expand all i8 ops out to i16. llvm-svn: 185174	2013-06-28 17:57:59 +00:00
Justin Holewinski	9ae87e685a	[NVPTX] Add support for vectorized function return values llvm-svn: 185173	2013-06-28 17:57:55 +00:00
Justin Holewinski	7332dc0027	[NVPTX] Clean up handling of formal arguments and enable generation of vector parameter loads llvm-svn: 185172	2013-06-28 17:57:53 +00:00
Weiming Zhao	b97c1a69a2	Bug 13662: Enable GPRPair for all i64 operands of inline asm on ARM This patch assigns paired GPRs for inline asm with 64-bit data on ARM. It's enabled for both ARM and Thumb to support modifiers like %H, %Q, %R. llvm-svn: 185169	2013-06-28 17:26:02 +00:00
Tom Stellard	99f122e9be	R600: Add local memory support via LDS Reviewed-by: Vincent Lejeune<vljn at ovi.com> llvm-svn: 185162	2013-06-28 15:47:08 +00:00
Tom Stellard	97e3c49801	R600: Add support for GROUP_BARRIER instruction Reviewed-by: Vincent Lejeune<vljn at ovi.com> llvm-svn: 185161	2013-06-28 15:46:59 +00:00
Tim Northover	e13264c995	ARM: ensure fixed-point conversions have sane types We were generating intrinsics for NEON fixed-point conversions that didn't exist (e.g. float -> i16). There are two cases to consider: + iN is smaller than float. In this case we can do the conversion but need an extend or truncate as well. + iN is larger than float. In this case using the NEON conversion would be incorrect so we don't perform any combining. llvm-svn: 185158	2013-06-28 15:29:25 +00:00
Manman Ren	5bedd08922	Debug Info: clean up usage of Verify. No functionality change. It should suffice to check the type of a debug info metadata, instead of calling Verify. For cases where we know the type of a DI metadata, use assert. Also update testing cases to make them conform to the format of DI classes. llvm-svn: 185135	2013-06-28 05:43:10 +00:00
Tom Stellard	e3160dde2c	R600: Remove alu-split.ll test The purpose of this test was to check boundary conditions for the size of an ALU clause. This test is very sensitive to changes to the optimizer or scheduler, because it requires an exact number of ALU instructions in order to remain valid. It's not good to have a test this sensitive, because it is confusing to developers who implement optimizations and then 'break' the test. I'm not sure if there is a good way to test these limits using lit, but if I can come up with replacement test that isn't as sensitive I'll add it back to the tree. llvm-svn: 185084	2013-06-27 17:00:38 +00:00
Joey Gouly	42f1898415	Add a Subtarget feature 'v8fp' to the ARM backend. llvm-svn: 185073	2013-06-27 11:49:26 +00:00
Richard Sandiford	56466fde7f	[SystemZ] Fix some embarrassing test typos llvm-svn: 185070	2013-06-27 09:49:34 +00:00
Richard Sandiford	609a7eb0a1	[SystemZ] Allow LA and LARL to be rematerialized llvm-svn: 185069	2013-06-27 09:42:10 +00:00
Richard Sandiford	a2d164d53e	[SystemZ] Allow immediate moves to be rematerialized llvm-svn: 185068	2013-06-27 09:38:48 +00:00
Richard Sandiford	964ffa104f	[SystemZ] Add conditional store patterns Add pseudo conditional store instructions, so that we use: branch foo: store foo: instead of: load branch foo: move foo: store z196 has real 32-bit and 64-bit conditional stores, but we don't use any z196 instructions yet. llvm-svn: 185065	2013-06-27 09:27:40 +00:00
Akira Hatanaka	1f2c22ad07	[mips] Improve code generation for constant multiplication using shifts, adds and subs. llvm-svn: 185011	2013-06-26 18:48:17 +00:00
Joey Gouly	f4bea6681b	Add a subtarget feature 'v8' to the ARM backend. This allows for targeting the ARMv8 AArch32 variant. llvm-svn: 184967	2013-06-26 16:58:26 +00:00
Tim Northover	817190b1e4	ARM: allow predicated barriers in Thumb mode The barrier instructions are only "always-execute" in ARM mode, they can quite happily sit inside an IT block in Thumb. llvm-svn: 184964	2013-06-26 16:52:32 +00:00
Joey Gouly	a82562fd5b	Remove the 'generic' CPU from the ARM eabi attributes printer. Make v4 the default ARM architecture attribute, to match CodeGen. llvm-svn: 184962	2013-06-26 16:39:06 +00:00
Elena Demikhovsky	ea4d3808e5	Optimized integer vector multiplication operation by replacing it with shift/xor/sub when it is possible. Fixed a bug in SDIV, where the const operand is not a splat constant vector. llvm-svn: 184931	2013-06-26 10:55:03 +00:00
Tom Stellard	3854f648a8	R600: Use new getNamedOperandIdx function generated by TableGen llvm-svn: 184880	2013-06-25 21:22:18 +00:00
Aaron Watry	d619f91b82	R600: Add v2i32 test for vselect Note: Only adding test for evergreen, not SI yet. When I attempted to expand vselect for SI, I got the following: llc: /home/awatry/src/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp:522: llvm::SDValue llvm::DAGTypeLegalizer::PromoteIntRes_SETCC(llvm::SDNode*): Assertion `SVT.isVector() == N->getOperand(0).getValueType().isVector() && "Vector compare must return a vector result!"' failed. Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 184847	2013-06-25 13:55:54 +00:00
Aaron Watry	1ee98e598b	R600/SI: Expand xor v2i32/v4i32 Add test cases for both vector sizes on SI and also add v2i32 test for EG. Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 184846	2013-06-25 13:55:52 +00:00
Aaron Watry	c7ec57ba3c	R600: Add v2i32 test for setcc on evergreen No test/expansion for SI has been added yet. Attempts to expand this operation for SI resulted in a stacktrace in (IIRC) LegalizeIntegerTypes which was complaining about vector comparisons being required to return a vector type. Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 184845	2013-06-25 13:55:49 +00:00
Aaron Watry	73046ba281	R600/SI: Expand urem of v2i32/v4i32 for SI Also add lit test for both cases on SI, and v2i32 for evergreen. Note: I followed the guidance of the v4i32 EG check... UREM produces really complex code, so let's just check that the instruction was lowered successfully. Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 184844	2013-06-25 13:55:46 +00:00
Aaron Watry	c00dd00a32	R600/SI: Expand udiv v[24]i32 for SI and v2i32 for EG Also add lit test for both cases on SI, and v2i32 for evergreen. Note: I followed the guidance of the v4i32 EG check... UDIV produces really complex code, so let's just check that the instruction was lowered successfully. Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 184843	2013-06-25 13:55:43 +00:00
Aaron Watry	0b4bbc3714	R600/SI: Expand ashr of v2i32/v4i32 for SI Also add lit test for both cases on SI, and v2i32 for evergreen. Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 184842	2013-06-25 13:55:40 +00:00
Aaron Watry	0bf6dc888a	R600/SI: Expand srl of v2i32/v4i32 for SI Also add lit test for both cases on SI, and v2i32 for evergreen. Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 184841	2013-06-25 13:55:37 +00:00
Aaron Watry	eafbde78e9	R600/SI: Expand shl of v2i32/v4i32 for SI Also add lit test for both cases on SI, and v2i32 for evergreen. Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 184840	2013-06-25 13:55:32 +00:00
Aaron Watry	d9f602bd35	R600/SI: Expand or of v2i32/v4i32 for SI Also add lit test for both cases on SI, and v2i32 for evergreen. Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 184839	2013-06-25 13:55:29 +00:00
Aaron Watry	688f496d43	R600/SI: Expand mul of v2i32/v4i32 for SI Also add lit test for both cases on SI, and v2i32 for evergreen. Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 184838	2013-06-25 13:55:26 +00:00
Aaron Watry	35d817a307	R600/SI: Expand and of v2i32/v4i32 for SI Also add lit test for both cases on SI, and v2i32 for evergreen. Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 184837	2013-06-25 13:55:23 +00:00
Andrew Trick	18751012bb	Revert "Temporarily enable MI-Sched on X86." This reverts commit 98a9b72e8c56dc13a2617de84503a3d78352789c. llvm-svn: 184823	2013-06-25 02:48:58 +00:00
Tom Stellard	fa154aaa39	R600/SI: Report unaligned memory accesses as legal for > 32-bit types In reality, some unaligned memory accesses are legal for 32-bit types and smaller too, but it all depends on the address space. Allowing unaligned loads/stores for > 32-bit types is mainly to prevent the legalizer from splitting one load into multiple loads of smaller types. https://bugs.freedesktop.org/show_bug.cgi?id=65873 llvm-svn: 184822	2013-06-25 02:39:35 +00:00
Tom Stellard	ddc78167d3	R600: Add support for i32 loads from the constant address space on Cayman Tested-By: Aaron Watry <awatry@gmail.com> llvm-svn: 184821	2013-06-25 02:39:30 +00:00
Tom Stellard	b4ab710b43	R600/SI: Add support for v4i32 and v4f32 kernel args Tested-By: Aaron Watry <awatry@gmail.com> llvm-svn: 184820	2013-06-25 02:39:25 +00:00
Tom Stellard	0300b2cb3a	R600: Fix typo in R600Schedule.td This should only make a difference in programs that use a lot of the vector ALU instructions like BFI_INT and BIT_ALIGN. There is a slight improvement in the phatk bitcoin mining kernel with this patch on Evergreen (vector size == 1): Before: 1173 Instruction Groups / 9520 dwords After: 1167 Instruction Groups / 9510 dwords Reviewed-by: Reviewed-by: Vincent Lejeune<vljn at ovi.com> llvm-svn: 184819	2013-06-25 02:39:20 +00:00
NAKAMURA Takumi	98994226f3	llvm/test/CodeGen/X86: Add explicit -mtriple=x86_64-unknown-unknown. llvm-svn: 184731	2013-06-24 13:19:59 +00:00
NAKAMURA Takumi	c4251e5aff	llvm/test/CodeGen/X86/legalize-shift-64.ll: Add explicit -mtriple=i686-unknown-unknown. llvm-svn: 184730	2013-06-24 13:19:52 +00:00
Andrew Trick	186f0e6dbe	Add -mcpu to some unit tests that only fail on certain hosts. llvm-svn: 184709	2013-06-24 09:51:30 +00:00
Andrew Trick	716b547d13	Temporarily enable MI-Sched on X86. Sorry for the unit test churn. I'll try to make the change permanently next time. llvm-svn: 184705	2013-06-24 09:13:20 +00:00
Andrew Trick	99b0b0ab75	Fix tail merging to assign the (more) correct BasicBlock when splitting. This makes it possible to write unit tests that are less susceptible to minor code motion, particularly copy placement. block-placement.ll covers this case with -pre-RA-sched=source which will soon be default. One incorrectly named block is already fixed, but without this fix, enabling new coalescing and scheduling would cause more failures. llvm-svn: 184680	2013-06-24 01:55:01 +00:00
Andrew Trick	2fca851aaf	Add MI-Sched support for x86 macro fusion. This is an awful implementation of the target hook. But we don't have abstractions yet for common machine ops, and I don't see any quick way to make it table-driven. llvm-svn: 184664	2013-06-23 09:00:28 +00:00
Reed Kotler	adddc55894	Replace with a shorter test case produced by Doug Gillmore. llvm-svn: 184645	2013-06-22 19:35:08 +00:00
David Blaikie	b1e1db3db6	DebugInfo: Don't lose unreferenced non-trivial by-value parameters A FastISel optimization was causing us to emit no information for such parameters & when they go missing we end up emitting a different function type. By avoiding that shortcut we not only get types correct (very important) but also location information (handy) - even if it's only live at the start of a function & may be clobbered later. Reviewed/discussion by Evan Cheng & Dan Gohman. llvm-svn: 184604	2013-06-21 22:56:30 +00:00
Michael Liao	d5d892d21f	Add '-mcpu=' to prevent breaking on ATOM due to different code schedule llvm-svn: 184591	2013-06-21 20:22:45 +00:00
Justin Holewinski	26b66eafe7	[NVPTX] Add support for selecting CUDA vs OCL mode based on triple IR for CUDA should use "nvptx[64]-nvidia-cuda", and IR for NV OpenCL should use "nvptx[64]-nvidia-nvcl" llvm-svn: 184579	2013-06-21 18:51:49 +00:00
Andrew Trick	eb3065a917	Add missing REQUIRES: asserts in crash.ll. llvm-svn: 184576	2013-06-21 18:47:08 +00:00
Michael Liao	cc3b61f064	Fix PR16360 When (srl (anyextend x), c) is folded into (anyextend (srl x, c)), the high bits are not cleared. Add 'and' to clear off them. llvm-svn: 184575	2013-06-21 18:45:27 +00:00
Andrew Trick	777b242083	Update physreg live intervals during remat. llvm-svn: 184574	2013-06-21 18:33:26 +00:00
Quentin Colombet	bdb6bfd975	ARM: Remove a (false) dependency on the memoryoperand's value as we do not use it at the moment. This allows to form more paired loads even when stack coloring pass destroys the memoryoperand's value. <rdar://problem/13978317> llvm-svn: 184492	2013-06-20 22:51:44 +00:00
Tom Stellard	a1d0e771db	R600/SI: Expand sub for v2i32 and v4i32 for SI Also add a v2i32 test to the existing v4i32 test. Patch by: Aaron Watry Reviewed-by: Tom Stellard <thomas.stellard@amd.com> Signed-off-by: Aaron Watry<awatry@gmail.com> llvm-svn: 184482	2013-06-20 21:55:37 +00:00
Tom Stellard	c419716668	R600/SI: Expand add for v2i32 and v4i32 Also add SI tests to existing file and a v2i32 test for both R600 and SI. Patch by: Aaron Watry Reviewed-by: Tom Stellard <thomas.stellard@amd.com> Signed-off-by: Aaron Watry <awatry@gmail.com> llvm-svn: 184481	2013-06-20 21:55:30 +00:00
Tom Stellard	a6ec00250a	R600: Expand v2i32 load/store instead of custom lowering The custom lowering causes llc to crash with a segfault. Ideally, the custom lowering can be fixed, but this allows programs which load/store v2i32 to work without crashing. Patch by: Aaron Watry Reviewed-by: Tom Stellard <thomas.stellard@amd.com> Signed-off-by: Aaron Watry<awatry@gmail.com> llvm-svn: 184480	2013-06-20 21:55:23 +00:00
David Blaikie	e21853eb2f	DebugInfo: don't use location lists when the location covers the whole function anyway Fix up three tests - one that was relying on abbreviation number, another relying on a location list in this case (& testing raw asm, changed that to use dwarfdump on the debug_info now that that's where the location is), and another which was added in r184368 - exposing a bug in that fix that is exposed when we emit the location inline rather than through a location list. Fix that bug while I'm here. llvm-svn: 184387	2013-06-20 00:25:24 +00:00
Tim Northover	7700179b08	AArch64: remove accidental test output file. llvm-svn: 184236	2013-06-18 21:16:53 +00:00
Quentin Colombet	337e6e6d1f	During SelectionDAG building explicitly set a node to constant zero when the value is zero. This allows optmizations to kick in more easily. Fix some test cases so that they remain meaningful (i.e., not completely dead coded) when optimizations apply. <rdar://problem/14096009> superfluous multiply by high part of zero-extended value. llvm-svn: 184222	2013-06-18 20:14:39 +00:00
Andrew Trick	5a345856da	Reenable, improve, and add MI-Sched unit tests. llvm-svn: 184134	2013-06-17 21:45:16 +00:00
Vincent Lejeune	e4a87c6317	R600: PV stores Reg id, not index llvm-svn: 184117	2013-06-17 20:16:40 +00:00
Vincent Lejeune	b04cbc8058	R600: Properly set COUNT_3 bit in TEX clause initiating inst for pre EG gen. Fixes rv7x0 bug in Heaven reported here: https://bugs.freedesktop.org/show_bug.cgi?id=64257 llvm-svn: 184116	2013-06-17 20:16:26 +00:00
Benjamin Kramer	2934370512	Switch spill weights from a basic loop depth estimation to BlockFrequencyInfo. The main advantages here are way better heuristics, taking into account not just loop depth but also __builtin_expect and other static heuristics and will eventually learn how to use profile info. Most of the work in this patch is pushing the MachineBlockFrequencyInfo analysis into the right places. This is good for a 5% speedup on zlib's deflate (x86_64), there were some very unfortunate spilling decisions in its hottest loop in longest_match(). Other benchmarks I tried were mostly neutral. This changes register allocation in subtle ways, update the tests for it. 2012-02-20-MachineCPBug.ll was deleted as it's very fragile and the instruction it looked for was gone already (but the FileCheck pattern picked up unrelated stuff). llvm-svn: 184105	2013-06-17 19:00:36 +00:00
David Blaikie	f3d2951503	Debug Info: Simplify Frame Index handling in DBG_VALUE Machine Instructions Rather than using the full power of target-specific addressing modes in DBG_VALUEs with Frame Indicies, simply use Frame Index + Offset. This reduces the complexity of debug info handling down to two representations of values (reg+offset and frame index+offset) rather than three or four. Ideally we could ensure that frame indicies had been eliminated by the time we reached an assembly or dwarf generation, but I haven't spent the time to figure out where the FIs are leaking through into that & whether there's a good place to convert them. Some FI+offset=>reg+offset conversion is done (see PrologEpilogInserter, for example) which is necessary for some SelectionDAG assumptions about registers, I believe, but it might be possible to make this a more thorough conversion & ensure there are no remaining FIs no matter how instruction selection is performed. llvm-svn: 184066	2013-06-16 20:34:15 +00:00
David Blaikie	ef0e4640d7	DebugInfo: follow up to 184045 to constrain the tests further to ensure they don't contain +0 offsets llvm-svn: 184046	2013-06-15 16:02:44 +00:00
David Blaikie	7a29358013	DebugInfo: print DBG_VALUE MachineInstrs with [] for deref and drop the offset when it's zero llvm-svn: 184045	2013-06-15 15:52:58 +00:00
Andrew Trick	5d13fe97ed	Machine Model: Add MicroOpBufferSize and resource BufferSize. Replace the ill-defined MinLatency and ILPWindow properties with with straightforward buffer sizes: MCSchedMode::MicroOpBufferSize MCProcResourceDesc::BufferSize These can be used to more precisely model instruction execution if desired. Disabled some misched tests temporarily. They'll be reenabled in a few commits. llvm-svn: 184032	2013-06-15 04:49:57 +00:00
David Blaikie	d1a8512cd8	Debug Info: Don't print the display name and colon prefix for DEBUG_VALUE comments if the display name is empty llvm-svn: 184026	2013-06-15 00:33:47 +00:00
Tom Stellard	0f71ebc5a8	R600: Add SI load support for v[24]i32 and store for v2i32 Also add a seperate vector lit test file, since r600 doesn't seem to handle v2i32 load/store yet, but we can test both for SI. Patch by: Aaron Watry Reviewed-by: Tom Stellard <thomas.stellard@amd.com> Signed-off-by: Aaron Watry <awatry@gmail.com> llvm-svn: 184021	2013-06-15 00:09:31 +00:00
Tom Stellard	428a19228a	R600: Use correct encoding for Vertex Fetch instructions on Cayman Reviewed-by: Vincent Lejeune<vljn at ovi.com> llvm-svn: 184016	2013-06-14 22:12:30 +00:00
Tom Stellard	c25852891f	R600: Use EXPORT_RAT_INST_STORE_DWORD for stores on Cayman We were using RAT_INST_STORE_RAW, which seemed to work, but the docs say this instruction doesn't exist for Cayman, so it's probably safer to use a documented instruction instead. Reviewed-by: Vincent Lejeune<vljn at ovi.com> llvm-svn: 184015	2013-06-14 22:12:24 +00:00
Tim Northover	11fabb62e9	Mark rematerialized super/sub registers as dead. When we're rematerializing into a not-quite-right register we already add the real definition as an imp-def, but we should also be marking the "official" register as dead, since nothing else is going to use it as a result of this remat. Not doing this can affect pressure tracking. rdar://problem/14158833 llvm-svn: 184002	2013-06-14 20:22:21 +00:00
Stephen Lin	ccd266c666	SelectionDAG: Fix incorrect condition checks in some cases of folding FADD/FMUL combinations; also improve accuracy of comments llvm-svn: 183993	2013-06-14 18:17:35 +00:00
Derek Schuff	4c429cad03	Make PrologEpilogInserter save/restore all callee saved registers in functions which call __builtin_unwind_init() __builtin_unwind_init() is an undocumented gcc intrinsic which has this effect, and is used in libgcc_eh. Goes part of the way toward fixing PR8541. llvm-svn: 183984	2013-06-14 16:15:29 +00:00
Benjamin Kramer	a0c15494c5	X86: cvtpi2ps is just an SSE instruction with MMX operands. It has no AVX equivalent. Give it the right register format so we can also emit it when AVX is enabled. llvm-svn: 183971	2013-06-14 09:31:41 +00:00
JF Bastien	cb60eaba94	Enable FastISel on ARM for Linux and NaCl, not MCJIT This is a resubmit of r182877, which was reverted because it broken MCJIT tests on ARM. The patch leaves MCJIT on ARM as it was before: only enabled for iOS. I've CC'ed people from the original review and revert. FastISel was only enabled for iOS ARM and Thumb2, this patch enables it for ARM (not Thumb2) on Linux and NaCl, but not MCJIT. Thumb2 support needs a bit more work, mainly around register class restrictions. The patch punts to SelectionDAG when doing TLS relocation on non-Darwin targets. I will fix this and other FastISel-to-SelectionDAG failures in a separate patch. The patch also forces FastISel to retain frame pointers: iOS always keeps them for backtracking (so emitted code won't change because of this), but Linux was getting much worse code that was incorrect when using big frames (such as test-suite's lencod). I'll also fix this in a later patch, it will probably require a peephole so that FastISel doesn't rematerialize frame pointers back-to-back. The test changes are straightforward, similar to: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130513/174279.html They also add a vararg test that got dropped in that change. I ran all of lnt test-suite on A15 hardware with --optimize-option=-O0 and all the tests pass. All the tests also pass on x86 make check-all. I also re-ran the check-all tests that failed on ARM, and they all seem to pass. llvm-svn: 183966	2013-06-14 02:49:43 +00:00
Bill Schmidt	fac3a79629	[PowerPC] Disable fast-isel for existing -O0 tests for PowerPC. This is a preliminary patch for fast instruction selection on PowerPC. Code generation can differ between DAG isel and fast isel. Existing tests that specify -O0 were written to expect DAG isel. Make this explicit by adding -fast-isel=false to the tests. In some cases specifying -fast-isel=false produces different code even when there isn't a fast instruction selector specified. This is because TM.Options.EnableFastISel = 1 at -O0 whether or not a FastISel object exists. Thus disabling fast isel can actually produce less conservative code. Because of this, some of the expected code generation in the -O0 tests needs to be adjusted. In particular, handling of function arguments is less conservative with -fast-isel=false (see isOnlyUsedInEntryBlock() in SelectionDAGBuilder.cpp). This results in fewer stack accesses and, in some cases, reduced stack size as uselessly loaded values are no longer stored back to spill locations in the stack. No functional change with this patch; test case adjustments only. llvm-svn: 183939	2013-06-13 20:23:34 +00:00
Akira Hatanaka	8332c68b45	[mips] Add an IR transformation pass that optimizes calls to sqrt. The pass emits a call to sqrt that has attribute "read-none". This call will be converted to an ISD::FSQRT node during DAG construction, which will turn into a mips native sqrt instruction. llvm-svn: 183802	2013-06-11 22:21:44 +00:00
Tim Northover	4ba890d132	X86: Stop LEA64_32r doing unspeakable things to its arguments. Previously LEA64_32r went through virtually the entire backend thinking it was using 32-bit registers until its blissful illusions were cruelly snatched away by MCInstLower and 64-bit equivalents were substituted at the last minute. This patch makes it behave normally, and take 64-bit registers as sources all the way through. Previous uses (for 32-bit arithmetic) are accommodated via SUBREG_TO_REG instructions which make the types and classes agree properly. llvm-svn: 183693	2013-06-10 20:43:49 +00:00
Justin Holewinski	fa2e2511f8	[NVPTX] Remove old CONST_NOT_GEN address space that is not being used anymore and causes constants to be emitted in the global address space llvm-svn: 183652	2013-06-10 13:29:47 +00:00
JF Bastien	be1c9b906e	Add test for ARM FastISel load/store register classes r183624 fixed an issue that was tested indirectly. Test it directly with this new test. llvm-svn: 183634	2013-06-10 00:35:57 +00:00
Reed Kotler	8176eeb183	Fix a regression I introduced when I expanded the complex pseudos in the Mips16 port. A few of the psuedos could either take signed or unsigned arguments and I did not distinguish the case and improperly rejected some valid cases that the assembler had previously accepted when they were pure pseudos that expanded as assembly instructions. llvm-svn: 183633	2013-06-09 23:23:46 +00:00
Logan Chien	61f1a9f5d2	Refine the ARM EHABI test cases. Since we have ARM unwind directive parser and assembler, we can check the correctness in two stages: 1. From LLVM assembly (.ll) to ARM assembly (.s) 2. From ARM assembly (.s) to ELF object file (.o) We already have several ".s to .o" test cases. This CL adds some ".ll to .s" test cases and removes the redundant ".ll to .o" test cases. New test cases to check ".ll to .s" code generator: - ehabi.ll: Check the correctness of the generated unwind directives. - section-name.ll: Check the section name of functions. Removed test cases: - ehabi-mc-cantunwind.ll (Covered by ehabi-cantunwind.ll, and eh-directive-cantunwind.s) - ehabi-mc-compact-pr0.ll (Covered by ehabi.ll, eh-compact-pr0.s, eh-directive-save.s, and eh-directive-setfp.s) - ehabi-mc-compact-pr1.ll (Covered by ehabi.ll, eh-compact-pr1.s, eh-directive-save.s, and eh-directive-setfp.s) - ehabi-mc.ll (Covered by ehabi.ll, and eh-directive-integrated-test.s) - ehabi-mc-section-group.ll (Covered by section-name.ll, and eh-directive-section-comdat.s) - ehabi-mc-section.ll (Covered by section-name.ll, and eh-directive-section.s) - ehabi-mc-sh_link.ll (Covered by eh-directive-text-section.s, and eh-directive-section.s) llvm-svn: 183628	2013-06-09 12:36:57 +00:00
Logan Chien	fa2266ded0	Fix ARM unwind opcode assembler in several cases. Changes to ARM unwind opcode assembler: * Fix multiple .save or .vsave directives. Besides, the order is preserved now. * For the directives which will generate multiple opcodes, such as ".save {r0-r11}", the order of the unwind opcode is fixed now, i.e. the registers with less encoding value are popped first. * Fix the $sp offset calculation. Now, we can use the .setfp, .pad, .save, and .vsave directives at any order. Changes to test cases: * Add test cases to check the order of multiple opcodes for the .save directive. * Fix the incorrect $sp offset in the test case. The stack pointer offset specified in the test case was incorrect. (Changed test cases: ehabi-mc-section.ll and ehabi-mc.ll) * The opcode to restore $sp are slightly reordered. The behavior are not changed, and the new output is same as the output of GNU as. (Changed test cases: eh-directive-pad.s and eh-directive-setfp.s) llvm-svn: 183627	2013-06-09 12:22:30 +00:00
Elena Demikhovsky	1c6fe6138b	Removed PackedDouble domain from scalar instructions. Added more formats for the scalar stuff. llvm-svn: 183626	2013-06-09 07:37:10 +00:00
Venkatraman Govindaraju	9767317993	[Sparc] Delete FPMover Pass and remove Fp* Pseudo-instructions from Sparc backend. llvm-svn: 183613	2013-06-08 15:32:59 +00:00
Quentin Colombet	288faf08d2	Reapply r183552. This time, use a standard type for the option to avoid template instantiation issue with non-standard type. Add a backend option to warn on a given stack size limit. Option: -mllvm -warn-stack-size=<limit> Output (if limit is exceeded): warning: Stack size limit exceeded (<actual size>) in <functionName>. The longer term plan is to hook that to a clang warning. PR:4072 <rdar://problem/13987214>. llvm-svn: 183595	2013-06-08 00:07:54 +00:00
Vincent Lejeune	2f252fdf26	R600: Anti dep better handled in tex clause llvm-svn: 183592	2013-06-07 23:30:26 +00:00
Jakob Stoklund Olesen	3a549110cd	Add missing zextloadi1 to i64 patterns. PR16721. llvm-svn: 183587	2013-06-07 22:55:05 +00:00
Hal Finkel	9c0ef0659d	Disallow i64 div/rem in PPC32 counter loops On PPC32, [su]div,rem on i64 types are transformed into runtime library function calls. As a result, they are not allowed in counter-based loops (the counter-loops verification pass caught this error; this change fixes PR16169). llvm-svn: 183581	2013-06-07 22:16:19 +00:00
Quentin Colombet	8f78b800ab	Revert commits related to stack warning. llvm-svn: 183579	2013-06-07 22:14:50 +00:00
Quentin Colombet	4a1b82f036	Explicit triple in warn stack size test cases to not depend on OS. llvm-svn: 183574	2013-06-07 21:09:42 +00:00
Tom Stellard	7c091ffbf7	R600: Fix calculation of stack offset in AMDGPUFrameLowering We weren't computing structure size correctly and we were relying on the original alloca instruction to compute the offset, which isn't always reliable. Reviewed-by: Vincent Lejeune <vljn@ovi.com> llvm-svn: 183568	2013-06-07 20:52:05 +00:00
Tom Stellard	f4646ab025	R600: Fix the fetch limits for R600 generation GPUs Reviewed-by: Vincent Lejeune <vljn@ovi.com> https://bugs.freedesktop.org/show_bug.cgi?id=64257 llvm-svn: 183560	2013-06-07 20:28:55 +00:00
Quentin Colombet	474d9dcceb	Add a backend option to warn on a given stack size limit. Option: -mllvm -warn-stack-size=<limit> Output (if limit is exceeded): warning: Stack size limit exceeded (<actual size>) in <functionName>. The longer term plan is to hook that to a clang warning. PR:4072 <rdar://problem/13987214> llvm-svn: 183552	2013-06-07 20:18:12 +00:00
JF Bastien	c950ffbe08	ARM FastISel integer sext/zext improvements My recent ARM FastISel patch exposed this bug: http://llvm.org/bugs/show_bug.cgi?id=16178 The root cause is that it can't select integer sext/zext pre-ARMv6 and asserts out. The current integer sext/zext code doesn't handle other cases gracefully either, so this patch makes it handle all sext and zext from i1/i8/i16 to i8/i16/i32, with and without ARMv6, both in Thumb and ARM mode. This should fix the bug as well as make FastISel faster because it bails to SelectionDAG less often. See fastisel-ext.patch for this. fastisel-ext-tests.patch changes current tests to always use reg-imm AND for 8-bit zext instead of UXTB. This simplifies code since it is supported on ARMv4t and later, and at least on A15 both should perform exactly the same (both have exec 1 uop 1, type I). 2013-05-31-char-shift-crash.ll is a bitcode version of the above bug 16178 repro. fast-isel-ext.ll tests all sext/zext combinations that ARM FastISel should now handle. Note that my ARM FastISel enabling patch was reverted due to a separate failure when dealing with MCJIT, I'll fix this second failure and then turn FastISel on again for non-iOS ARM targets. I've tested "make check-all" on my x86 box, and "lnt test-suite" on A15 hardware. llvm-svn: 183551	2013-06-07 20:10:37 +00:00
Quentin Colombet	a8970e620f	Teach AsmPrinter how to print odd constants. Fix an assertion when the compiler encounters big constants whose bit width is not a multiple of 64-bits. Although clang would never generate something like this, the backend should be able to handle any legal IR. <rdar://problem/13363576> llvm-svn: 183544	2013-06-07 18:36:03 +00:00
Roman Divacky	8439b144e6	Fix a typo in asm string of BP* family of instructions. With this fix I am able to compile/assemble/link/run /bin/echo from FreeBSD. llvm-svn: 183537	2013-06-07 17:46:57 +00:00
Rafael Espindola	7b8382bcbc	Support OpenBSD's native frame protection conventions. OpenBSD's stack smashing protection differs slightly from other platforms: 1. The smash handler function is "__stack_smash_handler(const char *funcname)" instead of "__stack_chk_fail(void)". 2. There's a hidden "long __guard_local" object that gets linked into each executable and DSO. Patch by Matthew Dempsky. llvm-svn: 183533	2013-06-07 16:35:57 +00:00
Venkatraman Govindaraju	867986b1ff	[Sparc]: Use cmp instruction instead of subcc to compare integers. llvm-svn: 183463	2013-06-07 00:03:36 +00:00
Vincent Lejeune	dd2a468cbd	R600: Add a pass that merge Vector Register Previously commited @183279 but tests were failing, reverted @183286 It was broken because @183336 was missing, now it's there. llvm-svn: 183343	2013-06-05 21:38:04 +00:00
Vincent Lejeune	cc7d08e974	R600: Schedule copy from phys register at beginning of block It allows regalloc pass to remove them by trivially assigning associated reg llvm-svn: 183336	2013-06-05 20:27:35 +00:00
Akira Hatanaka	b5546583ad	[mips] brcond + setgt/setugt instruction selection patterns. llvm-svn: 183334	2013-06-05 19:49:55 +00:00
Michael Liao	d464e3d65b	[PATCH] Fix VGATHER* operand constraints Add earlyclobber constaints to prevent input register being allocated as the output register because, according to Intel spec [1], "If any pair of the index, mask, or destination registers are the same, this instruction results a UD fault." --- [1] http://software.intel.com/sites/default/files/319433-014.pdf llvm-svn: 183327	2013-06-05 18:12:26 +00:00
Tom Stellard	ecf6bd2eaa	R600: Make sure to schedule AR register uses and defs in the same clause Reviewed-by: vljn at ovi.com llvm-svn: 183294	2013-06-05 03:43:06 +00:00
Rafael Espindola	f5e919b2e6	Revert "R600: Add a pass that merge Vector Register" This reverts commit r183279. CodeGen/R600/texture-input-merge.ll was failing. llvm-svn: 183286	2013-06-05 01:48:30 +00:00
Vincent Lejeune	57d56af481	R600: Add a pass that merge Vector Register llvm-svn: 183279	2013-06-04 23:17:26 +00:00
Vincent Lejeune	7c89765008	R600: Const/Neg/Abs can be folded to dot4 llvm-svn: 183278	2013-06-04 23:17:15 +00:00
Evan Cheng	1c010771f4	Cortex-R5 can issue Thumb2 integer division instructions. llvm-svn: 183275	2013-06-04 22:52:09 +00:00
David Majnemer	d0dc0d58f6	ARM: Fix crash in ARM backend inside of ARMConstantIslandPass The ARM backend did not expect LDRBi12 to hold a constant pool operand. Allow for LLVM to deal with the instruction similar to how it deals with LDRi12. This fixes PR16215. llvm-svn: 183238	2013-06-04 17:46:15 +00:00
Vincent Lejeune	8d2ef79cb9	R600: Swizzle texture/export instructions llvm-svn: 183229	2013-06-04 15:04:53 +00:00
Vincent Lejeune	e7cc832b43	R600: Add a test for r183108 llvm-svn: 183228	2013-06-04 15:03:35 +00:00
Tom Stellard	0c2bbb2a1f	R600/SI: Add support for work item and work group intrinsics llvm-svn: 183138	2013-06-03 17:40:18 +00:00
Tom Stellard	0faf53682e	R600/SI: Add a calling convention for compute shaders llvm-svn: 183137	2013-06-03 17:40:11 +00:00
Tom Stellard	47a52f3e69	R600/SI: Custom lower i64 sign_extend llvm-svn: 183136	2013-06-03 17:40:03 +00:00
Tom Stellard	7e44e13b15	R600/SI: Add support for global loads llvm-svn: 183131	2013-06-03 17:39:43 +00:00
Vincent Lejeune	55871f8f8a	R600: use capital letter for PV channel llvm-svn: 183107	2013-06-03 15:44:35 +00:00
Venkatraman Govindaraju	2d5d39937e	Sparc: Add support for indirect branch and blockaddress in Sparc backend. llvm-svn: 183094	2013-06-03 05:58:33 +00:00
Venkatraman Govindaraju	b47bd839e0	Sparc: When storing 0, use %g0 directly in the store instruction instead of using two instructions (sethi and store). llvm-svn: 183090	2013-06-03 00:21:54 +00:00
Venkatraman Govindaraju	0514a4c845	Sparc: Combine add/or/sethi instruction with restore if possible. llvm-svn: 183088	2013-06-02 21:48:17 +00:00
Venkatraman Govindaraju	acb910b7ae	Sparc: Perform leaf procedure optimization by default llvm-svn: 183083	2013-06-02 02:24:27 +00:00
Venkatraman Govindaraju	2425aef2ad	Sparc: Mark functions calling llvm.vastart and llvm.returnaddress intrinsics as non-leaf functions. llvm-svn: 183079	2013-06-01 20:42:48 +00:00
Tim Northover	e84e621d63	Revert r183069: "TMP: LEA64_32r fixing" Very sorry, it was committed from the wrong branch by mistake. llvm-svn: 183070	2013-06-01 10:23:46 +00:00
Tim Northover	93287c3991	TMP: LEA64_32r fixing llvm-svn: 183069	2013-06-01 10:21:54 +00:00
Tim Northover	8efc0e4868	X86: change MOV64ri64i32 into MOV32ri64 The MOV64ri64i32 instruction required hacky MCInst lowering because it was allocated as setting a GR64, but the eventual instruction ("movl") only set a GR32. This converts it into a so-called "MOV32ri64" which still accepts a (appropriate) 64-bit immediate but defines a GR32. This is then converted to the full GR64 by a SUBREG_TO_REG operation, thus keeping everyone happy. This fixes a typo in the opcode field of the original patch, which should make the legact JIT work again (& adds test for that problem). llvm-svn: 183068	2013-06-01 09:55:14 +00:00
Venkatraman Govindaraju	1eaf496598	[Sparc] Generate correct code for leaf functions with stack objects llvm-svn: 183067	2013-06-01 04:51:18 +00:00
Eric Christopher	e4ab862999	Temporarily Revert "X86: change MOV64ri64i32 into MOV32ri64" as it seems to have caused PR16192 and other JIT related failures. llvm-svn: 183059	2013-05-31 23:30:45 +00:00
Quentin Colombet	c3a4f33cc1	Modify how the formulae are rated in Loop Strength Reduce. Namely, check if the target allows to fold more that one register in the addressing mode and if yes, adjust the cost accordingly. Prior to this commit, reg1 + scale * reg2 accesses were artificially preferred to reg1 + reg2 accesses. Indeed, the cost model wrongly assumed that reg1 + reg2 needs a temporary register for the computation, whereas it was correctly estimated for reg1 + scale * reg2. <rdar://problem/13973908> llvm-svn: 183021	2013-05-31 17:20:29 +00:00
Richard Sandiford	77f91408dd	[SystemZ] Don't use LOAD and STORE REVERSED for volatile accesses Unlike most -- hopefully "all other", but I'm still checking -- memory instructions we support, LOAD REVERSED and STORE REVERSED may access the memory location several times. This means that they are not suitable for volatile loads and stores. This patch is a prerequisite for better atomic load and store support. The same principle applies there: almost all memory instructions we support are inherently atomic ("block concurrent"), but LOAD REVERSED and STORE REVERSED are exceptions. Other instructions continue to allow volatile operands. I will add positive "allows volatile" tests at the same time as the "allows atomic load or store" tests. llvm-svn: 183002	2013-05-31 13:25:22 +00:00
Justin Holewinski	d925e36ab2	[NVPTX] Re-enable support for virtual registers in the final output Now that 3.3 is branched, we are re-enabling virtual registers to help iron out bugs before the next release. Some of the post-RA passes do not play well with virtual registers, so we disable them for now. The needed functionality of the PrologEpilogInserter pass is copied to a new backend-specific NVPTXPrologEpilog pass. The test for this commit is not breaking the existing tests. llvm-svn: 182998	2013-05-31 12:14:49 +00:00
Tim Northover	8940245595	X86: change MOV64ri64i32 into MOV32ri64 The MOV64ri64i32 instruction required hacky MCInst lowering because it was allocated as setting a GR64, but the eventual instruction ("movl") only set a GR32. This converts it into a so-called "MOV32ri64" which still accepts a (appropriate) 64-bit immediate but defines a GR32. This is then converted to the full GR64 by a SUBREG_TO_REG operation, thus keeping everyone happy. llvm-svn: 182991	2013-05-31 09:57:13 +00:00
Akira Hatanaka	13f3fde46f	[mips] Big-endian code generation for atomic instructions. Patch by Jyun-Yan You. llvm-svn: 182984	2013-05-31 03:25:44 +00:00
Rafael Espindola	d23482285b	Revert r182937 and r182877. r182877 broke MCJIT tests on ARM and r182937 was working around another failure by r182877. This should make the ARM bots green. llvm-svn: 182960	2013-05-30 20:37:52 +00:00
Benjamin Kramer	f41318934e	Force a triple so we don't get bitten by windows' different regalloc. llvm-svn: 182935	2013-05-30 15:39:35 +00:00
Benjamin Kramer	8b559051e4	Force fragile test to the atom scheduler model. The pattern the test originally checked for doesn't occur on other -mcpu settings. On atom it's still there though slightly differently scheduled. llvm-svn: 182933	2013-05-30 15:22:28 +00:00
Tim Northover	2b261b4ac3	X86: allow registers 8-15 in test This test was failing on some hosts when an unexpected register was used for a variable. This just extends the regexp to allow the new x86-64 registers. llvm-svn: 182929	2013-05-30 13:56:32 +00:00
Tim Northover	aa5932cde5	X86: use sub-register sequences for MOV*r0 operations Instead of having a bunch of separate MOV8r0, MOV16r0, ... pseudo-instructions, it's better to use a single MOV32r0 (which will expand to "xorl %reg, %reg") and obtain other sizes with EXTRACT_SUBREG and SUBREG_TO_REG. The encoding is smaller and partial register updates can sometimes be avoided. Until recently, this sequence was a barrier to rematerialization though. That should now be fixed so it's an appropriate time to make the change. llvm-svn: 182928	2013-05-30 13:19:42 +00:00
Justin Holewinski	099d52887f	[NVPTX] Fix case where a sext load of an i1 type may produce an ld.u1 instead of an ld.u8. llvm-svn: 182924	2013-05-30 12:22:39 +00:00
Richard Sandiford	b7ab6fd782	[SystemZ] Enable unaligned accesses The code to distinguish between unaligned and aligned addresses was already there, so this is mostly just a switch-on-and-test process. llvm-svn: 182920	2013-05-30 09:45:42 +00:00
Rafael Espindola	5b34d5a3c7	Change how we iterate over relocations on ELF. For COFF and MachO, sections semantically have relocations that apply to them. That is not the case on ELF. In relocatable objects (.o), a section with relocations in ELF has offsets to another section where the relocations should be applied. In dynamic objects and executables, relocations don't have an offset, they have a virtual address. The section sh_info may or may not point to another section, but that is not actually used for resolving the relocations. This patch exposes that in the ObjectFile API. It has the following advantages: * Most (all?) clients can handle this more efficiently. They will normally walk all relocations, so doing an effort to iterate in a particular order doesn't save time. * llvm-readobj now prints relocations in the same way the native readelf does. * probably most important, relocations that don't point to any section are now visible. This is the case of relocations in the rela.dyn section. See the updated relocation-executable.test for example. llvm-svn: 182908	2013-05-30 03:05:14 +00:00
Bill Wendling	cab3fa1f1c	This testcase tests command line attributes which we don't yet support. In fact, we're probably going to support these flags in completely different ways. So this test is no longer valid. llvm-svn: 182899	2013-05-30 00:32:04 +00:00
Andrew Trick	aec414c298	Order CALLSEQ_START and CALLSEQ_END nodes. Fixes PR16146: gdb.base__call-ar-st.exp fails after pre-RA-sched=source fixes. Patch by Xiaoyi Guo! This also fixes an unsupported dbg.value test case. Codegen was previously incorrect but the test was passing by luck. llvm-svn: 182885	2013-05-29 22:03:55 +00:00
JF Bastien	235d8c9117	Enable FastISel on ARM for Linux and NaCl FastISel was only enabled for iOS ARM and Thumb2, this patch enables it for ARM (not Thumb2) on Linux and NaCl. Thumb2 support needs a bit more work, mainly around register class restrictions. The patch punts to SelectionDAG when doing TLS relocation on non-Darwin targets. I will fix this and other FastISel-to-SelectionDAG failures in a separate patch. The patch also forces FastISel to retain frame pointers: iOS always keeps them for backtracking (so emitted code won't change because of this), but Linux was getting much worse code that was incorrect when using big frames (such as test-suite's lencod). I'll also fix this in a later patch, it will probably require a peephole so that FastISel doesn't rematerialize frame pointers back-to-back. The test changes are straightforward, similar to: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130513/174279.html They also add a vararg test that got dropped in that change. I ran all of test-suite on A15 hardware with --optimize-option=-O0 and all the tests pass. llvm-svn: 182877	2013-05-29 20:38:10 +00:00
Tim Northover	db2d7a34b2	Teach ReMaterialization to be more cunning about subregisters This allows rematerialization during register coalescing to handle more cases involving operations like SUBREG_TO_REG which might need to be rematerialized using sub-register indices. For example, code like: v1(GPR64):sub_32 = MOVZ something v2(GPR64) = COPY v1(GPR64) should be convertable to: v2(GPR64):sub_32 = MOVZ something but previously we just gave up in places like this llvm-svn: 182872	2013-05-29 19:32:06 +00:00
Richard Sandiford	b70371b744	[SystemZ] Two tests missing from previous commit llvm-svn: 182847	2013-05-29 11:59:26 +00:00
Richard Sandiford	b62e20c071	[SystemZ] Immediate compare-and-branch support This patch adds support for the CIJ and CGIJ instructions. llvm-svn: 182846	2013-05-29 11:58:52 +00:00
Venkatraman Govindaraju	cb40ce1f29	[Sparc] Add support for leaf functions in sparc backend. llvm-svn: 182822	2013-05-29 04:46:31 +00:00
Richard Sandiford	4b6cfd7cec	[SystemZ] Register compare-and-branch support This patch adds support for the CRJ and CGRJ instructions. Support for the immediate forms will be a separate patch. The architecture has a large number of comparison instructions. I think it's generally better to concentrate on using the "best" comparison instruction first and foremost, then only use something like CRJ if CR really was the natual choice of comparison instruction. The patch therefore opportunistically converts separate CR and BRC instructions into a single CRJ while emitting instructions in ISelLowering. llvm-svn: 182764	2013-05-28 10:41:11 +00:00
Preston Gurd	8be7e42cc2	Convert sqrt functions into sqrt instructions when -ffast-math is in effect. When -ffast-math is in effect (on Linux, at least), clang defines __FINITE_MATH_ONLY__ > 0 when including <math.h>. This causes the preprocessor to include <bits/math-finite.h>, which renames the sqrt functions. For instance, "sqrt" is renamed as "__sqrt_finite". This patch adds the 3 new names in such a way that they will be treated as equivalent to their respective original names. llvm-svn: 182739	2013-05-27 15:44:35 +00:00
Rafael Espindola	bf27725489	Add a cpu to try to bring back the atom bots. llvm-svn: 182734	2013-05-27 13:22:52 +00:00
Hal Finkel	1f5ee2fefe	Prefer to duplicate PPC Altivec loads when expanding unaligned loads When expanding unaligned Altivec loads, we use the decremented offset trick to prevent page faults. Unfortunately, if we have a sequence of consecutive unaligned loads, this leads to suboptimal code generation because the 'extra' load from the first unaligned load can be combined with the base load from the second (but only if the decremented offset trick is not used for the first). Search up and down the chain, through loads and token factors, looking for consecutive loads, and if one is found, don't use the offset reduction trick. These duplicate loads are later combined to yield the desired sequence (in the future, we might want a more-powerful chain search, but that will require some changes to allow the combiner routines to access the AA object). This should complete the initial implementation of the optimized unaligned Altivec load expansion. There is some refactoring that should be done, but that will happen when the unaligned store expansion is added. llvm-svn: 182719	2013-05-26 18:08:30 +00:00
Andrew Trick	c92ce8a4f2	Fix PR16143: Insert DEBUG_VALUE before terminator. llvm-svn: 182717	2013-05-26 08:58:50 +00:00
Hal Finkel	f5d061cce9	PPC: Combine duplicate (offset) lvsl Altivec intrinsics The lvsl permutation control instruction is a function only of the alignment of the pointer operand (relative to the 16-byte natural alignment of Altivec vectors). As a result, multiple lvsl intrinsics where the operands differ by a multiple of 16 can be combined. llvm-svn: 182708	2013-05-25 04:05:05 +00:00
Andrew Trick	53ca6c9254	Track IR ordering of SelectionDAG nodes 4/4. Unit test cases for -pre-RA-sched=source. llvm-svn: 182706	2013-05-25 03:26:51 +00:00
Andrew Trick	34c31df32a	Track IR ordering of SelectionDAG nodes 3/4. Remove the old IR ordering mechanism and switch to new one. Fix unit test failures. llvm-svn: 182704	2013-05-25 03:08:10 +00:00
Hal Finkel	b8fe2ab5cb	PPC: Initial support for permutation-based unaligned Altivec loads Altivec only directly supports aligned loads, but the loads have a strange property: If given an unaligned address, they truncate the address to the next lower aligned address, and load from there. This property, along with an extra load and some special-purpose permutation-control instructions that generate the appropriate permutations from the original unaligned address, allow efficient lowering of aligned loads. This code uses the trick explained in the Apple Velocity Engine optimization overview document to prevent the needed extra load from possibly causing a page fault if the original address happens to be aligned. As noted in the FIXMEs, there are several additional optimizations that can be performed to reduce the cost of these loads even more. These will be implemented in future commits. llvm-svn: 182691	2013-05-24 23:00:14 +00:00
Diego Novillo	d1f091f169	Add a new function attribute 'cold' to functions. Other than recognizing the attribute, the patch does little else. It changes the branch probability analyzer so that edges into blocks postdominated by a cold function are given low weight. Added analysis and code generation tests. Added documentation for the new attribute. llvm-svn: 182638	2013-05-24 12:26:52 +00:00
Tim Northover	9b6ef4d68e	ARM: implement @llvm.readcyclecounter intrinsic This implements the @llvm.readcyclecounter intrinsic as the specific MRC instruction specified in the ARM manuals for CPUs with the Power Management extensions. Older CPUs had slightly different methods which may also have to be implemented eventually, but this should cover all v7 cases. rdar://problem/13939186 llvm-svn: 182603	2013-05-23 19:11:20 +00:00
Tom Stellard	dee18e3abb	R600: Fix R600ControlFlowFinalizer not considering VTX_READ 128 bit dst reg Patch by: Vincent Lejeune https://bugs.freedesktop.org/show_bug.cgi?id=64877 NOTE: This is a candidate for the 3.3 branch. llvm-svn: 182600	2013-05-23 18:26:42 +00:00
Jakob Stoklund Olesen	b4fe10613b	Fix PR16110: Handle DBG_VALUE in ConnectedVNInfoEqClasses::Distribute(). Now that the LiveDebugVariables pass is running after register coalescing, the ConnectedVNInfoEqClasses class needs to deal with DBG_VALUE instructions. This only comes up when rematerialization during coalescing causes the remaining live range of a virtual register to separate into two connected components. llvm-svn: 182592	2013-05-23 17:02:23 +00:00
Nick Lewycky	f93dded11b	Add missing test from r175092. llvm-svn: 182564	2013-05-23 07:46:13 +00:00
Nadav Rotem	bfc39207bf	X86: Fix a bug in EltsFromConsecutiveLoads. We can't generate new loads without chains. llvm-svn: 182507	2013-05-22 19:28:41 +00:00
Benjamin Kramer	dced131e7e	X86: When expanding PCMPGTQ to PCMPGTD we always want to compare the lower halves as unsigned. Take #2 on fixing PR15977. llvm-svn: 182486	2013-05-22 17:01:12 +00:00
David Majnemer	0c608d85e0	X86: Remove test instructions proceeding shift by immediate instructions Allow LLVM to take advantage of shift instructions that set the ZF flag, making instructions that test the destination superfluous. llvm-svn: 182454	2013-05-22 08:13:02 +00:00
Akira Hatanaka	4da68c1676	[mips] Rename option to make it compatible with gcc. llvm-svn: 182397	2013-05-21 17:17:59 +00:00
Akira Hatanaka	6123e22ce0	[mips] Add instruction selection patterns for blez and bgez. llvm-svn: 182396	2013-05-21 17:13:47 +00:00
Justin Holewinski	2a53cbfbe1	[NVPTX] Add @llvm.nvvm.sqrt.f() intrinsic llvm-svn: 182394	2013-05-21 16:51:30 +00:00
Justin Holewinski	e8d1165196	Drop @llvm.annotation and @llvm.ptr.annotation intrinsics during codegen. The intrinsic calls are dropped, but the annotated value is propagated. Fixes PR 15253 Original patch by Zeng Bin! llvm-svn: 182387	2013-05-21 14:37:16 +00:00
Benjamin Kramer	6b339bee4e	X86: When emulating unsigned PCMPGTQ with PCMPGTD, fix the sign bit for the smaller type. Otherwise we'll get a mix of signed and unsigned compares. Fixes PR15977. llvm-svn: 182364	2013-05-21 09:58:54 +00:00
Richard Sandiford	10059d8172	[SystemZ] Tighten branch tests After r182274, the branches in these tests must always be short. llvm-svn: 182358	2013-05-21 08:53:17 +00:00
Benjamin Kramer	5872fe13c9	DAGCombine: Avoid an edge case where it tried to create an i0 type for (x & 0) == 0. Fixes PR16083. llvm-svn: 182357	2013-05-21 08:51:09 +00:00
Reed Kotler	81c5979b12	Add checks that the proper predeined stubs are being called to the test case. These were accidentally omitted. llvm-svn: 182347	2013-05-21 01:27:36 +00:00
Reed Kotler	2d5f41cb35	Add some additional functions to the list of helper functions for pic calls. These need to be there so we don't try and use helper functions when we call those. As part of this, make sure that we properly exclude helper functions in pic mode when indirect calls are involved. llvm-svn: 182343	2013-05-21 00:50:30 +00:00
Akira Hatanaka	05711091c4	[mips] Add (setne $lhs, 0) instruction selection pattern. llvm-svn: 182307	2013-05-20 18:18:07 +00:00
Akira Hatanaka	96de87d87c	[mips] Trap on integer division by zero. By default, a teq instruction is inserted after integer divide. No divide-by-zero checks are performed if option "-mnocheck-zero-division" is used. llvm-svn: 182306	2013-05-20 18:07:43 +00:00
Justin Holewinski	2d2a08ee5e	[NVPTX] Fix mis-use of CurrentFnSym in NVPTXAsmPrinter. This was causing a symbol name error in the output PTX. llvm-svn: 182298	2013-05-20 16:42:18 +00:00
Tom Stellard	6cdacd1792	R600: Fix rotr.ll on non-asserts builds The -debug-only option is only available on asserts builds. llvm-svn: 182291	2013-05-20 15:28:48 +00:00
Tom Stellard	d72698331e	R600/SI: Add pattern for rotr Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> llvm-svn: 182286	2013-05-20 15:02:24 +00:00
Tom Stellard	5ca265d214	R600: Swap the legality of rotl and rotr The hardware supports rotr and not rotl. llvm-svn: 182285	2013-05-20 15:02:19 +00:00
Tom Stellard	1609774b15	R600/SI: Add patterns for 64-bit shift operations Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> llvm-svn: 182284	2013-05-20 15:02:12 +00:00
Richard Sandiford	cc815ef1d8	[SystemZ] Add long branch pass Before this change, the SystemZ backend would use BRCL for all branches and only consider shortening them to BRC when generating an object file. E.g. a branch on equal would use the JGE alias of BRCL in assembly output, but might be shortened to the JE alias of BRC in ELF output. This was a useful first step, but it had two problems: (1) The z assembler isn't traditionally supposed to perform branch shortening or branch relaxation. We followed this rule by not relaxing branches in assembler input, but that meant that generating assembly code and then assembling it would not produce the same result as going directly to object code; the former would give long branches everywhere, whereas the latter would use short branches where possible. (2) Other useful branches, like COMPARE AND BRANCH, do not have long forms. We would need to do something else before supporting them. (Although COMPARE AND BRANCH does not change the condition codes, the plan is to model COMPARE AND BRANCH as a CC-clobbering instruction during codegen, so that we can safely lower it to a separate compare and long branch where necessary. This is not a valid transformation for the assembler proper to make.) This patch therefore moves branch relaxation to a pre-emit pass. For now, calls are still shortened from BRASL to BRAS by the assembler, although this too is not really the traditional behaviour. The first test takes about 1.5s to run, and there are likely to be more tests in this vein once further branch types are added. The feeling on IRC was that 1.5s is a bit much for a single test, so I've restricted it to SystemZ hosts for now. The patch exposes (and fixes) some typos in the main CodeGen/SystemZ tests. A later patch will remove the {{g}}s from that directory. llvm-svn: 182274	2013-05-20 14:23:08 +00:00
Justin Holewinski	d5636664a4	[NVPTX] Add GenericToNVVM IR converter to better handle idiomatic LLVM IR inputs This converter currently only handles global variables in address space 0. For these variables, they are promoted to address space 1 (global memory), and all uses are updated to point to the result of a cvta.global instruction on the new variable. The motivation for this is address space 0 global variables are illegal since we cannot declare variables in the generic address space. Instead, we place the variables in address space 1 and explicitly convert the pointer to address space 0. This is primarily intended to help new users who expect to be able to place global variables in the default address space. llvm-svn: 182254	2013-05-20 12:13:32 +00:00
Justin Holewinski	fda22b94b1	[NVPTX] Fix i1 kernel parameters and global variables. ABI rules say we need to use .u8 for i1 parameters for kernels. llvm-svn: 182253	2013-05-20 12:13:28 +00:00
Stepan Dyatkovskiy	adb91e7ed9	PR15868 fix. Introduction: In case when stack alignment is 8 and GPRs parameter part size is not N8: we add padding to GPRs part, so part's last byte must be recovered at address K8-1. We need to do it, since remained (stack) part of parameter starts from address K8, and we need to "attach" "GPRs head" without gaps to it: Stack: \|---- 8 bytes block ----\| \|---- 8 bytes block ----\| \|---- 8 bytes... [ [padding] [GPRs head] ] [ ------ Tail passed via stack ------ ... FIX: Note, once we added padding we need to correct all* Arg offsets that are going after padded one. That's why we need this fix: Arg offsets were never corrected before this patch. See new test-cases included in patch. We also don't need to insert padding for byval parameters that are stored in GPRs only. We need pad only last byval parameter and only in case it outsides GPRs and stack alignment = 8. Though, stack area, allocated for recovered byval params, must satisfy "Size mod 8 = 0" restriction. This patch reduces stack usage for some cases: We can reduce ArgRegsSaveArea since inner N*4 bytes sized byval params my be "packed" with alignment 4 in some cases. llvm-svn: 182237	2013-05-20 08:01:34 +00:00
Jakob Stoklund Olesen	dad0022424	Also expand 64-bit bitcasts. llvm-svn: 182229	2013-05-20 01:01:43 +00:00
Jakob Stoklund Olesen	e033165fd4	Implement spill and fill of I64Regs. llvm-svn: 182228	2013-05-20 00:53:25 +00:00
Jakob Stoklund Olesen	f777e6d131	Mark i64 SETCC as expand so it is turned into a SELECT_CC. llvm-svn: 182227	2013-05-20 00:28:36 +00:00
Jakob Stoklund Olesen	50d419ac93	Don't use %g0 to materialize 0 directly. The wired physreg doesn't work on tied operands like on MOVXCC. Add a README note to fix this later. llvm-svn: 182225	2013-05-19 21:47:13 +00:00
Jakob Stoklund Olesen	f4ec84c2d4	Select i64 values with %icc conditions. llvm-svn: 182224	2013-05-19 20:38:21 +00:00

... 3 4 5 6 7 ...

7889 Commits