llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-21 03:53:04 +02:00

Author	SHA1	Message	Date
Benjamin Kramer	9205d91a11	DAGCombiner: Generate a correct constant for vector types when folding (xor (and)) into (and (not)). PR15948. llvm-svn: 181597	2013-05-10 14:09:52 +00:00
Tom Stellard	7edf38bf1f	R600: Remove AMDILPeeopholeOptimizer and replace optimizations with tablegen patterns The BFE optimization was the only one we were actually using, and it was emitting an intrinsic that we don't support. https://bugs.freedesktop.org/show_bug.cgi?id=64201 Reviewed-by: Christian König <christian.koenig@amd.com> NOTE: This is a candidate for the 3.3 branch. llvm-svn: 181580	2013-05-10 02:09:45 +00:00
Tom Stellard	ed363c57b2	R600: Expand SUB for v2i32/v4i32 Patch by: Aaron Watry Reviewed-by: Tom Stellard <thomas.stellard@amd.com> Signed-off-by: Aaron Watry <awatry@gmail.com> NOTE: This is a candidate for the 3.3 branch. llvm-svn: 181579	2013-05-10 02:09:39 +00:00
Tom Stellard	3ca3d250c6	R600: Expand MUL for v4i32/v2i32 Fixes piglit test for OpenCL builtin mul24, and allows mad24 to run. Patch by: Aaron Watry Reviewed-by: Tom Stellard <thomas.stellard@amd.com> Signed-off-by: Aaron Watry <awatry@gmail.com> NOTE: This is a candidate for the 3.3 branch. llvm-svn: 181578	2013-05-10 02:09:34 +00:00
Tom Stellard	56fef8261c	R600: Expand SRA for v4i32/v2i32 v2: Add v4i32 test Patch by: Aaron Watry Reviewed-by: Tom Stellard <thomas.stellard@amd.com> Signed-off-by: Aaron Watry <awatry@gmail.com> NOTE: This is a candidate for the 3.3 branch. llvm-svn: 181577	2013-05-10 02:09:29 +00:00
Tom Stellard	5d4a5a0d37	R600: Expand vselect for v4i32 and v2i32 v2: Add vselect v4i32 test Patch by: Aaron Watry Reviewed-by: Tom Stellard <thomas.stellard@amd.com> Signed-off-by: Aaron Watry <awatry@gmail.com> NOTE: This is a candidate for the 3.3 branch. llvm-svn: 181576	2013-05-10 02:09:24 +00:00
Owen Anderson	e34b034b7d	Teach SelectionDAG to constant fold all-constant FMA nodes the same way that it constant folds FADD, FMUL, etc. llvm-svn: 181555	2013-05-09 22:27:13 +00:00
Bill Wendling	59bb428cc9	Generate a compact unwind encoding in the face of a stack alignment push. We generate a `push' of a random register (%rax) if the stack needs to be aligned by the size of that register. However, this could mess up compact unwind generation. In particular, we want to still generate compact unwind in the presence of this monstrosity. Check if the push of of the %rax/%eax register. If it is and it's marked with the `FrameSetup' flag, then we can generate a compact unwind encoding for the function only if the push is the last FrameSetup instruction. llvm-svn: 181540	2013-05-09 20:10:38 +00:00
Jyotsna Verma	15d448ba0c	Hexagon: Use relation map for getMatchingCondBranchOpcode() and getInvertedPredicatedOpcode() functions instead of switch cases. llvm-svn: 181530	2013-05-09 18:25:44 +00:00
Richard Osborne	7504cb9f47	[XCore] Fix handling of functions where only the LR is spilled. Previously we only checked if the LR required saving if the frame size was non zero. However because the caller reserves 1 word for the callee to use that doesn't count towards our frame size it is possible for the LR to need saving and for the frame size to be 0. We didn't hit when the LR needed saving because of a function calls because the 1 word of stack we must allocate for our callee means the frame size is always non zero in this case. However we can hit this case if the LR is clobbered in inline asm. llvm-svn: 181520	2013-05-09 16:43:42 +00:00
Akira Hatanaka	a1d814e7b8	[mips] Add instruction selection pattern for (seteq $LHS, 0). llvm-svn: 181459	2013-05-08 19:38:04 +00:00
Bill Schmidt	7f1a2b5212	Fix handling of anonymous aggregate parameters for powerpc*-apple-darwin8. This fixes bug 15821 similarly to the powerpc64-linux fix for bug 14779. Patch by David Fang. llvm-svn: 181449	2013-05-08 17:22:33 +00:00
Michel Danzer	e95e9672c2	R600/SI: Add lit tests for llvm.SI.imageload and llvm.SI.resinfo intrinsics Adapted from the llvm.SI.sample test. Reviewed-by: Christian König <christian.koenig@amd.com> llvm-svn: 181425	2013-05-08 13:07:29 +00:00
Hal Finkel	1636de8210	PPCInstrInfo::optimizeCompareInstr should not optimize FP compares The floating-point record forms on PPC don't set the condition register bits based on a comparison with zero (like the integer record forms do), but rather based on the exception status bits. llvm-svn: 181423	2013-05-08 12:16:14 +00:00
Nick Lewycky	a88ff03516	Fix a bug in codegenprep where it was losing track of values OptimizeMemoryInst by switching to a ValueMap. Patch by Andrea DiBiagio! llvm-svn: 181397	2013-05-08 09:00:10 +00:00
David Majnemer	fdba035711	DAGCombiner: Simplify inverted bit tests Fold (xor (and x, y), y) -> (and (not x), y) This removes an opportunity for a constant to appear twice. llvm-svn: 181395	2013-05-08 06:44:42 +00:00
Jyotsna Verma	37863260ff	Hexagon: Fix Small Data support to handle -G 0 correctly. llvm-svn: 181344	2013-05-07 19:53:00 +00:00
Jyotsna Verma	5307666fe8	Reverting r181331. Missing file, HexagonSplitConst32AndConst64.cpp, from lib/Target/Hexagon/CMakeLists.txt. llvm-svn: 181334	2013-05-07 17:12:35 +00:00
Jyotsna Verma	af0c734e1b	Hexagon: Fix Small Data support to handle -G 0 correctly. llvm-svn: 181331	2013-05-07 16:42:15 +00:00
Bill Wendling	0c1e625af5	Reduce attributes. llvm-svn: 181245	2013-05-06 20:57:23 +00:00
Tom Stellard	fb8e73f3af	R600: Emit config values in register / value pairs Reviewed-by: Vincent Lejeune <vljn@ovi.com> Tested-By: Aaron Watry <awatry@gmail.com> llvm-svn: 181228	2013-05-06 17:50:51 +00:00
Tom Stellard	6c3f6e1b02	R600: Stop emitting the instruction type byte before each instruction Reviewed-by: Vincent Lejeune <vljn@ovi.com> Tested-By: Aaron Watry <awatry@gmail.com> llvm-svn: 181225	2013-05-06 17:50:44 +00:00
Tom Stellard	ebe049fd75	R600: Emit ISA for CALL_FS_* instructions Reviewed-by: Vincent Lejeune <vljn@ovi.com> Tested-By: Aaron Watry <awatry@gmail.com> llvm-svn: 181223	2013-05-06 17:50:26 +00:00
Ulrich Weigand	1431b3c2f5	[SystemZ] Add CodeGen test cases This adds all CodeGen tests for the SystemZ target. This version of the patch incorporates feedback from a review by Sean Silva. Thanks to all reviewers! Patch by Richard Sandiford. llvm-svn: 181204	2013-05-06 16:17:29 +00:00
Michael Kuperstein	74685eff73	Fix slightly too aggressive conact_vector optimization. (Would sometimes optimize away conacts used to extend a vector with undef values) llvm-svn: 181186	2013-05-06 08:06:13 +00:00
Bill Wendling	454b468436	Add a testcase that checks that we generate functions with frame pointers or not depending upon the function attributes. llvm-svn: 181180	2013-05-06 05:45:57 +00:00
Evan Cheng	2fcdb53946	Test case for r181160 and r181161. rdar://13782395 llvm-svn: 181162	2013-05-05 18:07:15 +00:00
Stepan Dyatkovskiy	c06cd03f6e	For ARM backend, fixed "byval" attribute support. Now even the small structures could be passed within byval (small enough to be stored in GPRs). In regression tests next function prototypes are checked: PR15293: %artz = type { i32 } define void @foo(%artz* byval %s) define void @foo2(%artz* byval %s, i32 %p, %artz* byval %s2) foo: "s" stored in R0 foo2: "s" stored in R0, "s2" stored in R2. Next AAPCS rules are checked: 5.5 Parameters Passing, C.4 and C.5, "ParamSize" is parameter size in 32bit words: -- NSAA != 0, NCRN < R4 and NCRN+ParamSize > R4. Parameter should be sent to the stack; NCRN := R4. -- NSAA != 0, and NCRN < R4, NCRN+ParamSize < R4. Parameter stored in GPRs; NCRN += ParamSize. llvm-svn: 181148	2013-05-05 07:48:36 +00:00
David Majnemer	d5ba4da281	Remove a recently redundant transform from X86ISelLowering. X86ISelLowering has support to treat: (icmp ne (and (xor %flags, -1), (shl 1, flag)), 0) as if it were actually: (icmp eq (and %flags, (shl 1, flag)), 0) However, r179386 has code at the InstCombine level to handle this. llvm-svn: 181145	2013-05-05 02:00:10 +00:00
Tim Northover	d4f2cac7b6	AArch64: support literal pool access in large memory model. llvm-svn: 181120	2013-05-04 16:54:07 +00:00
Tim Northover	4ef2500d01	AArch64: support large code model for jump-tables llvm-svn: 181119	2013-05-04 16:54:00 +00:00
Tim Northover	ece66eacb2	AArch64: implement support for blockaddress in large code model llvm-svn: 181118	2013-05-04 16:53:53 +00:00
Tim Northover	87645e02c0	AArch64: implement large code model access to global variables. The MOVZ/MOVK instruction sequence may not be the most efficient (a literal-pool load could be better) but adding that would require reinstating the ConstantIslands pass. For now the sequence is correct, and that's enough. Beware, as of commit GNU ld does not appear to support the relocations needed for this. Its primary purpose (for now) will be to support JITed code, since in that case there is no guarantee of where your code will end up in memory relative to external symbols it references. llvm-svn: 181117	2013-05-04 16:53:46 +00:00
Reed Kotler	b89d9a0181	Remove some uneeded pseudos in the presence of the naked function attribute. llvm-svn: 181072	2013-05-03 23:17:24 +00:00
Akira Hatanaka	5f295bccfc	[mips] Split the DSP control register and define one register for each field of its fields. This removes false dependencies between DSP instructions which access different fields of the the control register. Implicit register operands are added to instructions RDDSP and WRDSP after instruction selection, depending on the value of the mask operand. llvm-svn: 181041	2013-05-03 18:37:49 +00:00
Tom Stellard	2165728987	R600: Expand vector or, shl, srl, and xor nodes llvm-svn: 181035	2013-05-03 17:21:31 +00:00
Tom Stellard	f2fd0109a0	R600: Add pattern for SHA-256 Ma function This can be optimized using the BFI_INT instruction. llvm-svn: 181033	2013-05-03 17:21:20 +00:00
Akira Hatanaka	ab6ee99fe0	[mips] Handle reading, writing or copying of ccond field of DSP control register. - Define pseudo instructions which store or load ccond field of the DSP control register. - Emit the pseudos in MipsSEInstrInfo::storeRegToStack and loadRegFromStack. - Expand the pseudos before callee-scan save. - Emit instructions RDDSP or WRDSP to copy between ccond field and GPRs. llvm-svn: 180969	2013-05-02 23:07:05 +00:00
Vincent Lejeune	5d7b2a4aea	R600: Signed literals are 64bits wide llvm-svn: 180960	2013-05-02 21:53:03 +00:00
Vincent Lejeune	3ff31b75b3	R600: If previous bundle is dot4, PV valid chan is always X llvm-svn: 180959	2013-05-02 21:52:55 +00:00
Vincent Lejeune	97fb65a788	R600: Add a test to check that use_kill is emitted llvm-svn: 180958	2013-05-02 21:52:46 +00:00
Vincent Lejeune	62da1453e1	R600: Prettier asmPrint of Alu llvm-svn: 180956	2013-05-02 21:52:30 +00:00
Pranav Bhandarkar	520d26e773	Hexagon - Add peephole optimizations for zero extends. * lib/Target/Hexagon/HexagonInstrInfo.td: Add patterns to combine a sequence of a pair of i32->i64 extensions followed by a "bitwise or" into COMBINE_rr. * lib/Target/Hexagon/HexagonPeephole.cpp: Copy propagate Rx in the instruction Rp = COMBINE_Ir_V4(0, Rx) to the uses of Rp:subreg_loreg. * test/CodeGen/Hexagon/union-1.ll: New test. * test/CodeGen/Hexagon/combine_ir.ll: Fix test. llvm-svn: 180946	2013-05-02 20:22:51 +00:00
Manman Ren	0e2c14381d	TBAA: remove !tbaa from testing cases if not used. This will make it easier to turn on struct-path aware TBAA since the metadata format will change. llvm-svn: 180935	2013-05-02 18:11:35 +00:00
Michael Liao	ca62375d96	Rewrite X86 codegen regression test with FileCheck llvm-svn: 180910	2013-05-02 06:20:42 +00:00
Michael Liao	ec28235c2a	Avoid generating tempfile(s) never used As DejaGNU is deprecated, it seems pipe-jam issue doesn't exist any more. llvm-svn: 180892	2013-05-01 22:46:50 +00:00
Bill Wendling	218b457a2f	Revert r180737. The companion patch was reverted, and this is not relevant right now. llvm-svn: 180889	2013-05-01 22:32:08 +00:00
Nadav Rotem	d62966b79d	Optimize away nop CONCAT_VECTOR nodes. Optimize CONCAT_VECTOR nodes that merge EXTRACT_SUBVECTOR values that extract from the same vector. rdar://13402653 PR15866 llvm-svn: 180871	2013-05-01 19:18:51 +00:00
Rafael Espindola	056abc9e26	Put VMOVPQIto64rr in the VRPDI class. Patch by Joshua Magee. llvm-svn: 180842	2013-05-01 13:00:16 +00:00
Michael Liao	cff6c527b8	Forget remove the tempfile argument llvm-svn: 180838	2013-05-01 05:45:57 +00:00
Michael Liao	349e772e4f	More rewrites of x86 codegen regression tests with FileCheck llvm-svn: 180837	2013-05-01 05:34:30 +00:00
Akira Hatanaka	f5c940dea8	[mips] Fix handling of instructions which copy to/from accumulator registers. Expand copy instructions between two accumulator registers before callee-saved scan is done. Handle copies between integer GPR and hi/lo registers in MipsSEInstrInfo::copyPhysReg. Delete pseudo-copy instructions that are not needed. llvm-svn: 180827	2013-04-30 23:22:09 +00:00
Stephen Lin	84b2d4dbd4	Only pass 'returned' to target-specific lowering code when the value of entire register is guaranteed to be preserved. llvm-svn: 180825	2013-04-30 22:49:28 +00:00
Akira Hatanaka	0bca7f3584	[mips] Instruction selection patterns for DSP-ASE vector select and compare instructions. llvm-svn: 180820	2013-04-30 22:37:26 +00:00
Adrian Prantl	7482401c1d	Temporarily revert "Change the informal convention of DBG_VALUE so that we can express a" because it breaks some buildbots. This reverts commit 180816. llvm-svn: 180819	2013-04-30 22:35:14 +00:00
Adrian Prantl	baf0a98faa	Change the informal convention of DBG_VALUE so that we can express a register-indirect address with an offset of 0. It used to be that a DBG_VALUE is a register-indirect value if the offset (operand 1) is nonzero. The new convention is that a DBG_VALUE is register-indirect if the first operand is a register and the second operand is an immediate. For plain registers use the combination reg, reg. rdar://problem/13658587 llvm-svn: 180816	2013-04-30 22:16:46 +00:00
Hal Finkel	2ac40ae0d3	LocalStackSlotAllocation improvements First, taking advantage of the fact that the virtual base registers are allocated in order of the local frame offsets, remove the quadratic register-searching behavior. Because of the ordering, we only need to check the last virtual base register created. Second, store the frame index in the FrameRef structure, and get the frame index and the local offset from this structure at the top of the loop iteration. This allows us to de-nest the loops in insertFrameReferenceRegisters (and I think makes the code cleaner). I also moved the needsFrameBaseReg check into the first loop over instructions so that we don't bother pushing FrameRefs for instructions that don't want a virtual base register anyway. Lastly, and this is the only functionality change, avoid the creation of single-use virtual base registers. These are currently not useful because, in general, they end up replacing what would be one r+r instruction with an add and a r+i instruction. Committing this removes the XFAIL in CodeGen/PowerPC/2007-09-07-LoadStoreIdxForms.ll Jim has okayed this off-list. llvm-svn: 180799	2013-04-30 20:04:37 +00:00
Manman Ren	0b37dd0efc	TBAA: remove !tbaa from testing cases if not used. This will make it easier to turn on struct-path aware TBAA since the metadata format will change. llvm-svn: 180796	2013-04-30 17:52:57 +00:00
Vincent Lejeune	25352bd54f	R600: fix loop-address.ll test Texture cache is now used when shader type is not specified llvm-svn: 180785	2013-04-30 12:47:56 +00:00
Michael Liao	c67e0fc9ea	Rewrite X86 codegen regression test with FileCheck llvm-svn: 180776	2013-04-30 07:51:08 +00:00
Vincent Lejeune	29f24e0ce8	R600: use native for alu llvm-svn: 180761	2013-04-30 00:14:38 +00:00
Vincent Lejeune	e641cd06c9	R600: Add FetchInst bit to instruction defs to denote vertex/tex instructions v2[Vincent Lejeune]: Split FetchInst into usesTextureCache/usesVertexCache llvm-svn: 180755	2013-04-30 00:13:39 +00:00
Michael Liao	3db1a24464	Rewrite test in FileCheck instead of grep in X86 codegen llvm-svn: 180754	2013-04-30 00:13:38 +00:00
Manman Ren	180923f053	TBAA: remove !tbaa from testing cases if not used. This will make it easier to turn on struct-path aware TBAA since the metadata format will change. llvm-svn: 180745	2013-04-29 22:58:55 +00:00
Bill Wendling	e24a89874b	Duplicate a testcase. llvm-svn: 180744	2013-04-29 22:42:47 +00:00
Michael Liao	4e03c7690d	Rewrite some tests with FileCHeck in X86 codegen - Revise previous patches of the same purpose by fixing ) grep <PA> \| not grep <PB> semantically is not the same as CHECK: <PA>{{^<PB>.$}} as the former will check all occurrences of <PA> while the later only check the first match. As the result, CHECK needs putting in all place where <PA> occurs. *) grep <PA> \| count <N> needs a final CHECK-NOT of the same pattern. (As 'CHECK-<N>' is proposed for discussion, converting 'grep \| count <N>' where N > 1 is postponed.) llvm-svn: 180742	2013-04-29 22:41:29 +00:00
Tom Stellard	33e7a52e1c	R600: Use correct CF_END instruction on Northern Island GPUs llvm-svn: 180735	2013-04-29 22:23:58 +00:00
Tom Stellard	a22d2b47f3	R600: Fix encoding of CF_END_{EG, R600} instructions The EOP bit was not being encoded. llvm-svn: 180734	2013-04-29 22:23:54 +00:00
Rafael Espindola	ef355ff0c1	Make all darwin ppc stubs local. This fixes pr15763. Patch by David Fang. llvm-svn: 180657	2013-04-27 00:43:16 +00:00
Benjamin Kramer	583d3f2591	Make CHECK lines a bit less strict so they also match code generated for win64. Hopefully brings the windows buildbots back to life. llvm-svn: 180630	2013-04-26 21:04:21 +00:00
Tom Stellard	de2ad0a8f1	R600: Initialize AMDGPUMachineFunction::ShaderType to ShaderType::COMPUTE We need to intialize this to something and since clang does not set the shader type attribute and clang is used only for compute shaders, initializing it to COMPUTE seems like the best choice. Reviewed-by: Christian König <christian.koenig@amd.com> llvm-svn: 180620	2013-04-26 18:32:24 +00:00
Benjamin Kramer	40a2d53c85	ARM/NEON: Pattern match vector integer abs to vabs. llvm-svn: 180604	2013-04-26 15:00:57 +00:00
Benjamin Kramer	11723aa321	X86: Now that we have a canonical form for vector integer abs, match it into pabs. llvm-svn: 180600	2013-04-26 12:05:21 +00:00
Benjamin Kramer	7ce75fb032	DAGCombiner: Canonicalize vector integer abs in the same way we do it for scalars. This already helps SSE2 x86 a lot because it lacks an efficient way to represent a vector select. The long term goal is to enable the backend to match a canonicalized pattern into a single instruction (e.g. vabs or pabs). llvm-svn: 180597	2013-04-26 09:19:19 +00:00
Preston Gurd	0547d81fdb	This patch adds the X86FixupLEAs pass, which will reduce instruction latency for certain models of the Intel Atom family, by converting instructions into their equivalent LEA instructions, when it is both useful and possible to do so. llvm-svn: 180573	2013-04-25 20:29:37 +00:00
Chad Rosier	3030f76a0d	[inline asm] Add a test case for r180226. The specific issue is that the inline assembly is requesting a 64-bit register, which is invalid for i386. rdar://13731657 llvm-svn: 180445	2013-04-25 17:10:21 +00:00
Silviu Baranga	c656dd9bdd	Fix constant folding for one lane vector types. Constant folding one lane vector types not returns a vector instead of a scalar. llvm-svn: 180254	2013-04-25 09:32:33 +00:00
Tom Stellard	48d161332e	R600: Use SHT_PROGBITS for the .AMDGPU.config section The libelf implementation that is distributed here: http://www.mr511.de/software/english.html will not parse sections that are marked SHT_NULL. llvm-svn: 180230	2013-04-24 23:56:14 +00:00
Andrew Trick	73014520d6	MI Sched: eliminate local vreg copies. For now, we just reschedule instructions that use the copied vregs and let regalloc elliminate it. I would really like to eliminate the copies on-the-fly during scheduling, but we need a complete implementation of repairIntervalsInRange() first. The general strategy is for the register coalescer to eliminate as many global copies as possible and shrink live ranges to be extended-basic-block local. The coalescer should not have to worry about resolving local copies (e.g. it shouldn't attemp to reorder instructions). The scheduler is a much better place to deal with local interference. The coalescer side of this equation needs work. llvm-svn: 180193	2013-04-24 15:54:43 +00:00
Jyotsna Verma	4a5d195942	Hexagon: Use multiclass for combine and STri[bhwd]_shl_V4 instructions. llvm-svn: 180145	2013-04-23 21:17:40 +00:00
Stephen Lin	44a24c9593	Add more tests for r179925 to verify correct handling of signext/zeroext; strengthen condition check to require actual MVT::i32 virtual register types, just in case (no actual functionality change) llvm-svn: 180138	2013-04-23 19:42:25 +00:00
Jyotsna Verma	624f2e0434	Hexagon: Remove assembler mapped instruction definitions. llvm-svn: 180133	2013-04-23 19:15:55 +00:00
Vincent Lejeune	3666f07489	R600: Use .AMDGPU.config section to emit stacksize llvm-svn: 180124	2013-04-23 17:34:12 +00:00
Vincent Lejeune	e5ba5f1b14	R600: Add CF_END llvm-svn: 180123	2013-04-23 17:34:00 +00:00
Jyotsna Verma	20903a7aba	Hexagon: Remove duplicate instructions to handle global/immediate values for absolute/absolute-set addressing modes. llvm-svn: 180120	2013-04-23 17:11:46 +00:00
Rafael Espindola	f7c86d97a1	Move test from grep to FileCheck. llvm-svn: 180092	2013-04-23 12:03:27 +00:00
Akira Hatanaka	913bf6194a	[mips] In performDSPShiftCombine, check that all elements in the vector are shifted by the same amount and the shift amount is smaller than the element size. llvm-svn: 180039	2013-04-22 19:58:23 +00:00
Stephen Lin	4e394628e7	Extra paranoid test for r179925 (verify that tail calls are not generated to 'this'-returning constructors of objects with different 'this' pointers than the caller) llvm-svn: 180032	2013-04-22 17:23:49 +00:00
Stepan Dyatkovskiy	8adaf54376	Fix for 5.5 Parameter Passing --> Stage C: -- C.4 and C.5 statements, when NSAA is not equal to SP. -- C.1.cp statement for VA functions. Note: There are no VFP CPRCs in a variadic procedure. Before this patch "NSAA != 0" means "don't use GPRs anymore ". But there are some exceptions in AAPCS. 1. For non VA function: allocate all VFP regs for CPRC. When all VFPs are allocated CPRCs would be sent to stack, while non CPRCs may be still allocated in GRPs. 2. Check that for VA functions all params uses GPRs and then stack. No exceptions, no CPRCs here. llvm-svn: 180011	2013-04-22 13:06:52 +00:00
Arnaud A. de Grandmaison	087fe129d8	Cleanup: test source files do not need to be executable llvm-svn: 180003	2013-04-22 08:02:43 +00:00
David Blaikie	9bfe15c313	Revert "Revert "PR14606: debug info imported_module support"" This reverts commit r179840 with a fix to test/DebugInfo/two-cus-from-same-file.ll I'm not sure why that test only failed on ARM & MIPS and not X86 Linux, even though the debug info was clearly invalid on all of them, but this ought to fix it. llvm-svn: 179996	2013-04-22 06:12:31 +00:00
Jim Grosbach	3104dcf2ca	Legalize vector truncates by parts rather than just splitting. Rather than just splitting the input type and hoping for the best, apply a bit more cleverness. Just splitting the types until the source is legal often leads to an illegal result time, which is then widened and a scalarization step is introduced which leads to truly horrible code generation. With the loop vectorizer, these sorts of operations are much more common, and so it's worth extra effort to do them well. Add a legalization hook for the operands of a TRUNCATE node, which will be encountered after the result type has been legalized, but if the operand type is still illegal. If simple splitting of both types ends up with the result type of each half still being legal, just do that (v16i16 -> v16i8 on ARM, for example). If, however, that would result in an illegal result type (v8i32 -> v8i8 on ARM, for example), we can get more clever with power-two vectors. Specifically, split the input type, but also widen the result element size, then concatenate the halves and truncate again. For example on ARM, To perform a "%res = v8i8 trunc v8i32 %in" we transform to: %inlo = v4i32 extract_subvector %in, 0 %inhi = v4i32 extract_subvector %in, 4 %lo16 = v4i16 trunc v4i32 %inlo %hi16 = v4i16 trunc v4i32 %inhi %in16 = v8i16 concat_vectors v4i16 %lo16, v4i16 %hi16 %res = v8i8 trunc v8i16 %in16 This allows instruction selection to generate three VMOVN instructions instead of a sequences of moves, stores and loads. Update the ARMTargetTransformInfo to take this improved legalization into account. Consider the simplified IR: define <16 x i8> @test1(<16 x i32>* %ap) { %a = load <16 x i32>* %ap %tmp = trunc <16 x i32> %a to <16 x i8> ret <16 x i8> %tmp } define <8 x i8> @test2(<8 x i32>* %ap) { %a = load <8 x i32>* %ap %tmp = trunc <8 x i32> %a to <8 x i8> ret <8 x i8> %tmp } Previously, we would generate the truly hideous: .syntax unified .section __TEXT,__text,regular,pure_instructions .globl _test1 .align 2 _test1: @ @test1 @ BB#0: push {r7} mov r7, sp sub sp, sp, #20 bic sp, sp, #7 add r1, r0, #48 add r2, r0, #32 vld1.64 {d24, d25}, [r0:128] vld1.64 {d16, d17}, [r1:128] vld1.64 {d18, d19}, [r2:128] add r1, r0, #16 vmovn.i32 d22, q8 vld1.64 {d16, d17}, [r1:128] vmovn.i32 d20, q9 vmovn.i32 d18, q12 vmov.u16 r0, d22[3] strb r0, [sp, #15] vmov.u16 r0, d22[2] strb r0, [sp, #14] vmov.u16 r0, d22[1] strb r0, [sp, #13] vmov.u16 r0, d22[0] vmovn.i32 d16, q8 strb r0, [sp, #12] vmov.u16 r0, d20[3] strb r0, [sp, #11] vmov.u16 r0, d20[2] strb r0, [sp, #10] vmov.u16 r0, d20[1] strb r0, [sp, #9] vmov.u16 r0, d20[0] strb r0, [sp, #8] vmov.u16 r0, d18[3] strb r0, [sp, #3] vmov.u16 r0, d18[2] strb r0, [sp, #2] vmov.u16 r0, d18[1] strb r0, [sp, #1] vmov.u16 r0, d18[0] strb r0, [sp] vmov.u16 r0, d16[3] strb r0, [sp, #7] vmov.u16 r0, d16[2] strb r0, [sp, #6] vmov.u16 r0, d16[1] strb r0, [sp, #5] vmov.u16 r0, d16[0] strb r0, [sp, #4] vldmia sp, {d16, d17} vmov r0, r1, d16 vmov r2, r3, d17 mov sp, r7 pop {r7} bx lr .globl _test2 .align 2 _test2: @ @test2 @ BB#0: push {r7} mov r7, sp sub sp, sp, #12 bic sp, sp, #7 vld1.64 {d16, d17}, [r0:128] add r0, r0, #16 vld1.64 {d20, d21}, [r0:128] vmovn.i32 d18, q8 vmov.u16 r0, d18[3] vmovn.i32 d16, q10 strb r0, [sp, #3] vmov.u16 r0, d18[2] strb r0, [sp, #2] vmov.u16 r0, d18[1] strb r0, [sp, #1] vmov.u16 r0, d18[0] strb r0, [sp] vmov.u16 r0, d16[3] strb r0, [sp, #7] vmov.u16 r0, d16[2] strb r0, [sp, #6] vmov.u16 r0, d16[1] strb r0, [sp, #5] vmov.u16 r0, d16[0] strb r0, [sp, #4] ldm sp, {r0, r1} mov sp, r7 pop {r7} bx lr Now, however, we generate the much more straightforward: .syntax unified .section __TEXT,__text,regular,pure_instructions .globl _test1 .align 2 _test1: @ @test1 @ BB#0: add r1, r0, #48 add r2, r0, #32 vld1.64 {d20, d21}, [r0:128] vld1.64 {d16, d17}, [r1:128] add r1, r0, #16 vld1.64 {d18, d19}, [r2:128] vld1.64 {d22, d23}, [r1:128] vmovn.i32 d17, q8 vmovn.i32 d16, q9 vmovn.i32 d18, q10 vmovn.i32 d19, q11 vmovn.i16 d17, q8 vmovn.i16 d16, q9 vmov r0, r1, d16 vmov r2, r3, d17 bx lr .globl _test2 .align 2 _test2: @ @test2 @ BB#0: vld1.64 {d16, d17}, [r0:128] add r0, r0, #16 vld1.64 {d18, d19}, [r0:128] vmovn.i32 d16, q8 vmovn.i32 d17, q9 vmovn.i16 d16, q8 vmov r0, r1, d16 bx lr llvm-svn: 179989	2013-04-21 23:47:41 +00:00
Jim Grosbach	2582e2e539	ARM: Split out cost model vcvt testcases. They had a separate RUN line already, so may as well be in a separate file. llvm-svn: 179988	2013-04-21 23:47:37 +00:00
Jakob Stoklund Olesen	c9f30e9065	Passing arguments to varags functions under the SPARC v9 ABI. Arguments after the fixed arguments never use the floating point registers. llvm-svn: 179987	2013-04-21 21:36:49 +00:00
Jakob Stoklund Olesen	d8a2b84611	Fix the SETHIimm pattern for 64-bit code. Don't ignore the high 32 bits of the immediate. llvm-svn: 179985	2013-04-21 21:18:03 +00:00
Tim Northover	593f76e08e	ARM: fix part of test which actually needed an asserts build This should fix a buildbot failure that occurred after r179977. llvm-svn: 179978	2013-04-21 12:20:19 +00:00
Tim Northover	943f2a9234	ARM: Use ldrd/strd to spill 64-bit pairs when available. This allows common sp-offsets to be part of the instruction and is probably faster on modern CPUs too. llvm-svn: 179977	2013-04-21 11:57:07 +00:00
Bill Wendling	61eb6957c5	Remove tbaa metadata. llvm-svn: 179970	2013-04-21 01:38:25 +00:00
Jakob Stoklund Olesen	cbfbef04da	Compile varargs functions for SPARCv9. With a little help from the frontend, it looks like the standard va_* intrinsics can do the job. Also clean up an old bitcast hack in LowerVAARG that dealt with unaligned double loads. Load SDNodes can specify an alignment now. Still missing: Calling varargs functions with float arguments. llvm-svn: 179961	2013-04-20 22:49:16 +00:00
Tim Northover	de5285eb6f	ARM: don't add FrameIndex offset for LDMIA (has no immediate) Previously, when spilling 64-bit paired registers, an LDMIA with both a FrameIndex and an offset was produced. This kind of instruction shouldn't exist, and the extra operand was being confused with the predicate, causing aborts later on. This removes the invalid 0-offset from the instruction being produced. llvm-svn: 179956	2013-04-20 19:31:00 +00:00
Stephen Lin	98df7358cd	Minor renaming of tests (for consistency with an in-development patch) llvm-svn: 179954	2013-04-20 16:21:26 +00:00
Benjamin Kramer	a7e8f887fe	Don't litter .s files in test directory. llvm-svn: 179937	2013-04-20 10:43:40 +00:00
Stephen Lin	9d99ba2071	Add CodeGen support for functions that always return arguments via a new parameter attribute 'returned', which is taken advantage of in target-independent tail call opportunity detection and in ARM call lowering (when placed on an integral first parameter). llvm-svn: 179925	2013-04-20 05:14:40 +00:00
Stephen Lin	65c1101eba	Allow tail call opportunity detection through nested and/or multiple iterations of extractelement/insertelement indirection llvm-svn: 179924	2013-04-20 04:27:51 +00:00
Akira Hatanaka	11b4211d68	[mips] Instruction selection patterns for DSP-ASE vector shifts. llvm-svn: 179906	2013-04-19 23:21:32 +00:00
Hal Finkel	9e44a50443	Fix PPC optimizeCompareInstr swapped-sub argument handling When matching a compare with a subtract where the arguments of the compare are swapped w.r.t. the arguments of the subtract, we need to negate the predicates (or CR bit indices) of the users. This, however, is not the same as inverting the predicate (negating LT -> GT, but inverting LT -> GE, for example). The ARM backend seems to do this correctly, but when I adapted the code for the PPC backend, I introduced an error in this logic. Comparison optimization is now enabled again by default. llvm-svn: 179899	2013-04-19 22:08:38 +00:00
Anton Korobeynikov	f95220dd8b	Do not mangle in MS-way the globals with magic \001 in the name. Based on the patch by David Nadlinger! llvm-svn: 179889	2013-04-19 21:20:56 +00:00
Bill Wendling	e8c6d1cb09	Make test slightly more readable. llvm-svn: 179888	2013-04-19 21:14:59 +00:00
Bill Wendling	7256108f6f	Add a testcase to make sure we generate the proper compact unwind section for a function that cannot produce a compact unwind encoding. llvm-svn: 179887	2013-04-19 21:07:11 +00:00
Eric Christopher	88bdd26cc9	Revert "PR14606: debug info imported_module support" This reverts commit r179836 as it seems to have caused test failures. llvm-svn: 179840	2013-04-19 07:47:16 +00:00
David Blaikie	46f35f8e56	PR14606: debug info imported_module support Adding another CU-wide list, in this case of imported_modules (since they should be relatively rare, it seemed better to add a list where each element had a "context" value, rather than add a (usually empty) list to every scope). This takes care of DW_TAG_imported_module, but to fully address PR14606 we'll need to expand this to cover DW_TAG_imported_declaration too. llvm-svn: 179836	2013-04-19 06:57:04 +00:00
Tom Stellard	017c53ebbd	R600: Add pattern for the BFI_INT instruction llvm-svn: 179830	2013-04-19 02:11:06 +00:00
Tom Stellard	b767059700	R600: Reorganize lit tests and document how they should be organized llvm-svn: 179828	2013-04-19 02:10:53 +00:00
Hal Finkel	b96e61374f	Disable PPC comparison optimization by default This seems to cause a stage-2 LLVM compile failure (by crashing TableGen); do I'm disabling this for now. llvm-svn: 179807	2013-04-18 22:54:25 +00:00
Hal Finkel	44190578df	Implement optimizeCompareInstr for PPC Many PPC instructions have a so-called 'record form' which stores to a specific condition register the result of comparing the result of the instruction with zero (always as a signed comparison). For integer operations on PPC64, this is always a 64-bit comparison. This implementation is derived from the implementation in the ARM backend; there are some differences because PPC condition registers are allocatable virtual registers (although the record forms always use a specific one), and we look for a matching subtraction instruction after the compare (but before the first use) in addition to before it. llvm-svn: 179802	2013-04-18 22:15:08 +00:00
Benjamin Kramer	aeff9e581b	X86: Add an SSE2 lowering for 64 bit compares when pcmpgtq (SSE4.2) isn't available. This pattern started popping up in vectorized min/max reductions. llvm-svn: 179797	2013-04-18 21:37:45 +00:00
Derek Schuff	c55a3d43a9	Allow misaligned stores in x86 fast-isel. In X86FastISel::X86SelectStore(), improperly aligned stores are rejected and handled by the DAG-based ISel. However, X86FastISel::X86SelectLoad() makes no such requirement. There doesn't appear to be an x86 architectural correctness issue with allowing potentially unaligned store instructions. This patch removes this restriction. Patch by Jim Stichnot. llvm-svn: 179774	2013-04-18 17:41:08 +00:00
Hao Liu	ca09ec237c	Fix for PR14824, An ARM Load/Store Optimization bug llvm-svn: 179751	2013-04-18 09:11:08 +00:00
Eli Bendersky	802610971f	This patch teaches x86 fast-isel to generate the native div/idiv instructions for the sdiv/srem/udiv/urem bitcode instructions. This is done for the i8, i16, and i32 types, as well as i64 for the x86_64 target. Patch by Jim Stichnoth llvm-svn: 179715	2013-04-17 20:10:13 +00:00
Vincent Lejeune	cd0483fb18	R600: Make Export Instruction not duplicable llvm-svn: 179686	2013-04-17 15:17:39 +00:00
Richard Osborne	3bc2e6cf63	[XCore] Extend test to check positve offsets are folded into addresses. llvm-svn: 179621	2013-04-16 20:05:52 +00:00
Richard Osborne	d8d60d4b61	[XCore] Give test more generic name. I intend to extend the test with more offset folding checks llvm-svn: 179620	2013-04-16 19:56:55 +00:00
Richard Osborne	53ec25c8fa	[XCore] Convert a couple of tests to FileCheck. llvm-svn: 179619	2013-04-16 19:41:19 +00:00
Logan Chien	6f13ff357d	Implement ARM unwind opcode assembler. llvm-svn: 179591	2013-04-16 12:02:21 +00:00
Jakob Stoklund Olesen	b4edc00933	Add 64-bit multiply and divide instructions for SPARC v9. llvm-svn: 179582	2013-04-16 02:57:02 +00:00
Tom Stellard	bd67f8cd81	R600/SI: Emit config values in register value pairs. Instead of emitting config values in a predefined order, the code emitter will now emit a 32-bit register index followed by the 32-bit config value. llvm-svn: 179546	2013-04-15 17:51:35 +00:00
Tom Stellard	a44e2e18a1	R600/SI: Emit configuration value in the .AMDGPU.config ELF section llvm-svn: 179545	2013-04-15 17:51:30 +00:00
Tom Stellard	cb4468b00a	R600: Emit ELF formatted code rather than raw ISA. llvm-svn: 179544	2013-04-15 17:51:21 +00:00
Tim Northover	b5dc8bb136	Avoid outputting temporary test file into source tree. llvm-svn: 179532	2013-04-15 15:49:13 +00:00
Hal Finkel	371be65604	Fix PPC64 CR spill location for callee-saved registers This fixes an ABI bug for non-Darwin PPC64. For the callee-saved condition registers, the spill location is specified relative to the stack pointer (SP + 8). However, this is not relative to the SP after the new stack frame is established, but instead relative to the caller's stack pointer (it is stored into the linkage area of the parent's stack frame). So, like with the link register, we don't directly spill the CRs with other callee-saved registers, but just mark them to be spilled during prologue generation. In practice, this reverts r179457 for PPC64 (but leaves it in place for PPC32). llvm-svn: 179500	2013-04-15 02:07:05 +00:00
Jakob Stoklund Olesen	3b790b7f2e	Use i32 for all SPARC shift amounts, even in 64-bit mode. Test case by llvm-stress. llvm-svn: 179477	2013-04-14 05:48:50 +00:00
Jakob Stoklund Olesen	8fafe8cd31	Add support for the abs64 SPARC v9 code model. For when 16 TB just isn't enough. llvm-svn: 179474	2013-04-14 05:10:36 +00:00
Jakob Stoklund Olesen	d29f125f5b	Add support for the SPARC v9 abs44 code model. This is the default model for non-PIC 64-bit code. It supports text+data+bss linked anywhere in the low 16 TB of the address space. llvm-svn: 179473	2013-04-14 04:57:51 +00:00
Jakob Stoklund Olesen	b5173ad8fb	Also put target flags on SPARC constant pool references. Constant pool entries are accessed exactly the same way as global variables. llvm-svn: 179471	2013-04-14 04:35:16 +00:00
Jakob Stoklund Olesen	c23fada5f9	Fix patterns for 64-bit pointers. This fixes the pic32 code model for SPARC v9. llvm-svn: 179469	2013-04-14 01:53:23 +00:00
Jakob Stoklund Olesen	6182cc630a	Define SPARC code models. Currently, only abs32 and pic32 are implemented. Add a test case for abs32 with 64-bit code. 64-bit PIC code is currently broken. llvm-svn: 179463	2013-04-13 19:02:23 +00:00
Hal Finkel	978a847acb	Spill and restore PPC CR registers using the FP when we have one For functions that need to spill CRs, and have dynamic stack allocations, the value of the SP during the restore is not what it was during the save, and so we need to use the FP in these cases (as for all of the other spills and restores, but the CR restore has a special code path because its reserved slot, like the link register, is specified directly relative to the adjusted SP). llvm-svn: 179457	2013-04-13 08:09:20 +00:00
Andrew Trick	2bd87ad8d4	Further generalize this scheduler test. The order of copies depends on queue order, which is not very stable. llvm-svn: 179456	2013-04-13 07:37:27 +00:00
Andrew Trick	fb2a8d10f8	Fix a dislexic regex. llvm-svn: 179455	2013-04-13 07:29:21 +00:00
Andrew Trick	1ef71359cd	Add a missing REQUIRES: asserts llvm-svn: 179453	2013-04-13 06:12:46 +00:00
Andrew Trick	861493bc4f	MI-Sched: schedule physreg copies. The register allocator expects minimal physreg live ranges. Schedule physreg copies accordingly. This is slightly tricky when they occur in the middle of the scheduling region. For now, this is handled by rescheduling the copy when its associated instruction is scheduled. Eventually we may instead bundle them, but only if we can preserve the bundles as parallel copies during regalloc. llvm-svn: 179449	2013-04-13 06:07:40 +00:00
Akira Hatanaka	e0468ce3e1	[mips] Reapply r179420 and r179421. llvm-svn: 179434	2013-04-13 00:55:41 +00:00
Akira Hatanaka	b0b85e00d8	Revert r179420 and r179421. llvm-svn: 179422	2013-04-12 22:40:07 +00:00
Akira Hatanaka	737648f84c	[mips] Instruction selection patterns for carry-setting and using add instructions. llvm-svn: 179421	2013-04-12 22:24:52 +00:00
Akira Hatanaka	d809bc8eeb	[mips] v4i8 and v2i16 add, sub and mul instruction selection patterns. llvm-svn: 179420	2013-04-12 22:14:24 +00:00
Nico Rieck	1162bb7a1d	Replace coff-/elf-dump with llvm-readobj llvm-svn: 179361	2013-04-12 04:06:46 +00:00
Nadav Rotem	f96cc4976d	Fix the test on linux by setting the triple and the align format llvm-svn: 179354	2013-04-12 01:07:16 +00:00
Nadav Rotem	662256bafa	Add a flag to align all basic blocks in the function. When debugging performance regressions we often ask ourselves if the regression that we see is due to poor isel/sched/ra or due to some micro-architetural problem. When comparing two code sequences one good way to rule out front-end bottlenecks (and other the issues) is to force code alignment. This pass adds a flag that forces the alignment of all of the basic blocks in the program. llvm-svn: 179353	2013-04-12 00:48:32 +00:00
Preston Gurd	8e4196dfe6	Use FileCheck instead of grep. llvm-svn: 179322	2013-04-11 21:39:01 +00:00
Jack Carter	834846d7a8	Mips specific inline asm memory operand modifier test case These changes are based on commit responses for r179135. llvm-svn: 179315	2013-04-11 19:39:19 +00:00
Eli Bendersky	0ce49fd520	Add a CHECK-NOT for a more faithful translation of the original grep \| count 2. Thanks to Reid Kleckner for catching this. llvm-svn: 179289	2013-04-11 14:43:19 +00:00
Benjamin Kramer	f15ba24b8d	Add missing colons to check lines. llvm-svn: 179277	2013-04-11 12:41:41 +00:00
Benjamin Kramer	4413e71a39	FileCheckize a bunch of tests. llvm-svn: 179276	2013-04-11 12:32:23 +00:00
Michael Liao	877d1576e6	Optimize vector select from all 0s or all 1s As packed comparisons in AVX/SSE produce all 0s or all 1s in each SIMD lane, vector select could be simplified to AND/OR or removed if one or both values being selected is all 0s or all 1s. llvm-svn: 179267	2013-04-11 05:15:54 +00:00
Michael Liao	87125582e9	Enhance bool simplifcation in X86 to handle more cases This patch is revised based on patch from Victor Umansky <victor.umansky@intel.com>. More cases are handled in X86's bool simplification, i.e. - SETCC_CARRY - value is truncated to i1 with AND As a by-product, PR5443 is also fixed. llvm-svn: 179265	2013-04-11 04:43:09 +00:00
Eli Bendersky	90daaa543a	Rewrite some of the test/CodeGen/X86 tests to use FileCheck instead of grep llvm-svn: 179241	2013-04-10 23:30:20 +00:00
Hal Finkel	03d47320aa	Manually remove successors in if conversion when CopyAndPredicateBlock is used In the simple and triangle if-conversion cases, when CopyAndPredicateBlock is used because the to-be-predicated block has other predecessors, we need to explicitly remove the old copied block from the successors list. Normally if conversion relies on TII->AnalyzeBranch combined with BB->CorrectExtraCFGEdges to cleanup the successors list, but if the predicated block contained an un-analyzable branch (such as a now-predicated return), then this will fail. These extra successors were causing a problem on PPC because it was causing later passes (such as PPCEarlyReturm) to leave dead return-only basic blocks in the code. llvm-svn: 179227	2013-04-10 22:05:25 +00:00
Jack Carter	8e319ed798	Mips specific inline asm memory operand modifier test case These changes are based on commit responses for r179135. llvm-svn: 179225	2013-04-10 22:02:32 +00:00
Michel Danzer	c1562afdde	R600/SI: Add pattern for AMDGPUurecip 21 more little piglits with radeonsi. Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 179186	2013-04-10 17:17:56 +00:00
Reed Kotler	68e5128508	This is for an experimental option -mips-os16. The idea is to compile all Mips32 code as Mips16 unless it can't be compiled as Mips 16. For now this would happen as long as floating point instructions are not needed. Probably it would also make sense to compile as mips32 if atomic operations are needed too. There may be other cases too. A module pass prescans the IR and adds the mips16 or nomips16 attribute to functions depending on the functions needs. Mips 16 mode can result in a 40% code compression by utililizing 16 bit encoding of many instructions. The hope is for this to replace the traditional gcc way of dealing with Mips16 code using floating point which involves essentially using soft float but with a library implemented using mips32 floating point. This gcc method also requires creating stubs so that Mips32 code can interact with these Mips 16 functions that have floating point needs. My conjecture is that in reality this traditional gcc method would never win over this new method. I will be implementing the traditional gcc method also. Some of it is already done but I needed to do the stubs to finish the work and those required this mips16/32 mixed mode capability. I have more ideas for to make this new method much better and I think the old method will just live in llvm for anyone that needs the backward compatibility but I don't for what reason that would be needed. llvm-svn: 179185	2013-04-10 16:58:04 +00:00
Vincent Lejeune	daa1e69206	R600: Add VTX_READ_* and RAT_WRITE_CACHELESS_* when computing cf addr llvm-svn: 179174	2013-04-10 13:29:20 +00:00
Christian Konig	f40f671bab	R600/SI: dynamical figure out the reg class of MIMG Depending on the number of bits set in the writemask. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> llvm-svn: 179166	2013-04-10 08:39:16 +00:00
Christian Konig	76cd1a76c2	R600/SI: adjust writemask to only the used components Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> llvm-svn: 179165	2013-04-10 08:39:08 +00:00
Christian Konig	ffddac18a4	R600/SI: remove image sample writemask Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> llvm-svn: 179164	2013-04-10 08:39:01 +00:00
Evan Cheng	9f82233851	__sincosf_stret returns sinf / cosf in bits 0:31 and 32:63 of xmm0, not in xmm0 / xmm1. rdar://13599493 llvm-svn: 179141	2013-04-10 01:26:07 +00:00
Jack Carter	03f8f98410	Mips specific inline asm operand modifier 'D' Modifier 'D' is to use the second word of a double integer. We had previously implemented the pure register varient of the modifier and this patch implements the memory reference. #include "stdio.h" int b[8] = {0,1,2,3,4,5,6,7}; void main() { int i; // The first word. Notice, no 'D' {asm ( "lw %0,%1;" : "=r" (i) : "m" ((b+4)) );} printf("%d\n",i); // The second word {asm ( "lw %0,%D1;" : "=r" (i) : "m" ((b+4)) );} printf("%d\n",i); } llvm-svn: 179135	2013-04-09 23:19:50 +00:00
Hal Finkel	8b05494b58	Allow PPC B and BLR to be if-converted into some predicated forms This enables us to form predicated branches (which are the same conditional branches we had before) and also a larger set of predicated returns (including instructions like bdnzlr which is a conditional return and loop-counter decrement all in one). At the moment, if conversion does not capture all possible opportunities. A simple example is provided in early-ret2.ll, where if conversion forms one predicated return, and then the PPCEarlyReturn pass picks up the other one. So, at least for now, we'll keep both mechanisms. llvm-svn: 179134	2013-04-09 22:58:37 +00:00
Reed Kotler	9b753510a5	This patch enables llvm to switch between compiling for mips32/mips64 and mips16 on a per function basis. Because this patch is somewhat involved I have provide an overview of the key pieces of it. The patch is written so as to not change the behavior of the non mixed mode. We have tested this a lot but it is something new to switch subtargets so we don't want any chance of regression in the mainline compiler until we have more confidence in this. Mips32/64 are very different from Mip16 as is the case of ARM vs Thumb1. For that reason there are derived versions of the register info, frame info, instruction info and instruction selection classes. Now we register three separate passes for instruction selection. One which is used to switch subtargets (MipsModuleISelDAGToDAG.cpp) and then one for each of the current subtargets (Mips16ISelDAGToDAG.cpp and MipsSEISelDAGToDAG.cpp). When the ModuleISel pass runs, it determines if there is a need to switch subtargets and if so, the owning pointers in MipsTargetMachine are appropriately changed. When 16Isel or SEIsel is run, they will return immediately without doing any work if the current subtarget mode does not apply to them. In addition, MipsAsmPrinter needs to be reset on a function basis. The pass BasicTargetTransformInfo is substituted with a null pass since the pass is immutable and really needs to be a function pass for it to be used with changing subtargets. This will be fixed in a follow on patch. llvm-svn: 179118	2013-04-09 19:46:01 +00:00
Benjamin Kramer	788f55c7d4	DAGCombiner: Fold a shuffle on CONCAT_VECTORS into a new CONCAT_VECTORS if possible. This pattern occurs in SROA output due to the way vector arguments are lowered on ARM. The testcase from PR15525 now compiles into this, which is better than the code we got with the old scalarrepl: _Store: ldr.w r9, [sp] vmov d17, r3, r9 vmov d16, r1, r2 vst1.8 {d16, d17}, [r0] bx lr Differential Revision: http://llvm-reviews.chandlerc.com/D647 llvm-svn: 179106	2013-04-09 17:41:43 +00:00
Hal Finkel	c72f2476a9	Use virtual base registers on PPC On PowerPC, non-vector loads and stores have r+i forms; however, in functions with large stack frames these were not being used to access slots far from the stack pointer because such slots were out of range for the signed 16-bit immediate offset field. This increases register pressure because we need a separate register for each offset (when the r+r form is used). By enabling virtual base registers, we can deal with large stack frames without unduly increasing register pressure. llvm-svn: 179105	2013-04-09 17:27:09 +00:00
Hal Finkel	68742d4f1f	Convert test PowerPC/2007-09-07-LoadStoreIdxForms to FileCheck llvm-svn: 179104	2013-04-09 17:26:55 +00:00
Jakob Stoklund Olesen	6a275455ac	Compute correct frame sizes for SPARC v9 64-bit frames. The save area is twice as big and there is no struct return slot. The stack pointer is always 16-byte aligned (after adding the bias). Also eliminate the stack adjustment instructions around calls when the function has a reserved stack frame. llvm-svn: 179083	2013-04-09 04:37:47 +00:00
Hal Finkel	0daaa8e2de	Generate PPC early conditional returns PowerPC has a conditional branch to the link register (return) instruction: BCLR. This should be used any time when we'd otherwise have a conditional branch to a return. This adds a small pass, PPCEarlyReturn, which runs just prior to the branch selection pass (and, importantly, after block placement) to generate these conditional returns when possible. It will also eliminate unconditional branches to returns (these happen rarely; most of the time these have already been tail duplicated by the time PPCEarlyReturn is invoked). This is a nice optimization for small functions that do not maintain a stack frame. llvm-svn: 179026	2013-04-08 16:24:03 +00:00
Tim Northover	8eb5637d73	AArch64: remove barriers from AArch64 atomic operations. I've managed to convince myself that AArch64's acquire/release instructions are sufficient to guarantee C++11's required semantics, even in the sequentially-consistent case. llvm-svn: 179005	2013-04-08 08:40:41 +00:00
Hal Finkel	9af1fa0d50	Cleanup and improve PPC fsel generation First, we should not cheat: fsel-based lowering of select_cc is a finite-math-only optimization (the ISA manual, section F.3 of v2.06, makes this clear, as does a note in our own README). This also adds fsel-based lowering of EQ and NE condition codes. As it turned out, fsel generation was covered by a grand total of zero regression test cases. I've added some test cases to cover the existing behavior (which is now finite-math only), as well as the new EQ cases. llvm-svn: 179000	2013-04-07 22:11:09 +00:00
Jakob Stoklund Olesen	658f7754a3	Implement LowerCall_64 for the SPARC v9 64-bit ABI. There is still no support for byval arguments (which I don't think are needed) and varargs. llvm-svn: 178993	2013-04-07 19:10:57 +00:00
Jakob Stoklund Olesen	594e073ba0	Implement LowerReturn_64 for SPARC v9. Integer return values are sign or zero extended by the callee, and structs up to 32 bytes in size can be returned in registers. The CC_Sparc64 CallingConv definition is shared between LowerFormalArguments_64 and LowerReturn_64. Function arguments and return values are passed in the same registers. The inreg flag is also used for return values. This is required to handle C functions returning structs containing floats and ints: struct ifp { int i; float f; }; struct ifp f(void); LLVM IR: define inreg { i32, float } @f() { ... ret { i32, float } %retval } The ABI requires that %retval.i is returned in the high bits of %i0 while %retval.f goes in %f1. Without the inreg return value attribute, %retval.i would go in %i0 and %retval.f would go in %f3 which is a more efficient way of returning %multiple values, but it is not ABI compliant for returning C structs. llvm-svn: 178966	2013-04-06 23:57:33 +00:00
Jakob Stoklund Olesen	2e3d792bf5	SPARC v9 stack pointer bias. 64-bit SPARC v9 processes use biased stack and frame pointers, so the current function's stack frame is located at %sp+BIAS .. %fp+BIAS where BIAS = 2047. This makes more local variables directly accessible via [%fp+simm13] addressing. llvm-svn: 178965	2013-04-06 21:38:57 +00:00
Hal Finkel	6c65d3a736	Implement PPCInstrInfo::FoldImmediate There are certain PPC instructions into which we can fold a zero immediate operand. We can detect such cases by looking at the register class required by the using operand (so long as it is not otherwise constrained). llvm-svn: 178961	2013-04-06 19:30:30 +00:00
Jakob Stoklund Olesen	2076ffbc15	Complete formal arguments for the SPARC v9 64-bit ABI. All arguments are formally assigned to stack positions and then promoted to floating point and integer registers. Since there are more floating point registers than integer registers, this can cause situations where floating point arguments are assigned to registers after integer arguments that where assigned to the stack. Use the inreg flag to indicate 32-bit fragments of structs containing both float and int members. The three-way shadowing between stack, integer, and floating point registers requires custom argument lowering. The good news is that return values are passed in the exact same way, and we can share the code. Still missing: - Update LowerReturn to handle structs returned in registers. - LowerCall. - Variadic functions. llvm-svn: 178958	2013-04-06 18:32:12 +00:00
Tom Stellard	8ad4f7c25b	R600/SI: Add support for buffer stores v2 v2: - Use the ADDR64 bit Reviewed-by: Christian König <christian.koenig@amd.com> llvm-svn: 178931	2013-04-05 23:31:51 +00:00
Tom Stellard	917d7412f1	R600/SI: Add processor types for each SI variant Reviewed-by: Christian König <christian.koenig@amd.com> llvm-svn: 178928	2013-04-05 23:31:35 +00:00
Tom Stellard	17fba38a4b	R600/SI: Avoid generating S_MOVs with 64-bit immediates v2 SITargetLowering::analyzeImmediate() was converting the 64-bit values to 32-bit and then checking if they were an inline immediate. Some of these conversions caused this check to succeed and produced S_MOV instructions with 64-bit immediates, which are illegal. v2: - Clean up logic Reviewed-by: Christian König <christian.koenig@amd.com> llvm-svn: 178927	2013-04-05 23:31:20 +00:00
Hal Finkel	b556d850c9	Enable early if conversion on PPC On cores for which we know the misprediction penalty, and we have the isel instruction, we can profitably perform early if conversion. This enables us to replace some small branch sequences with selects and avoid the potential stalls from mispredicting the branches. Enabling this feature required implementing canInsertSelect and insertSelect in PPCInstrInfo; isel code in PPCISelLowering was refactored to use these functions as well. llvm-svn: 178926	2013-04-05 23:29:01 +00:00
Timur Iskhodzhanov	c004e9db2d	Make the test/CodeGen/X86/win32_sret.ll reliable on any CPU by explicitly specifying the -mcpu llvm-svn: 178885	2013-04-05 17:05:56 +00:00
Renato Golin	9d05117f2b	Reverting 178851 as it broke buildbots llvm-svn: 178883	2013-04-05 16:39:53 +00:00
Stepan Dyatkovskiy	98f7dac944	Fix for PR14824: "Optimization arm_ldst_opt inserts newly generated instruction vldmia at incorrect position". Patch introduces memory operands tracking in ARMLoadStoreOpt::LoadStoreMultipleOpti. For each register it keeps the order of load operations as it was before optimization pass. It is kind of deep improvement of fix proposed by Hao: http://llvm.org/bugs/show_bug.cgi?id=14824#c4 But it also tracks conflicts between different register classes (e.g. D2 and S5). For more details see: Bug description: http://llvm.org/bugs/show_bug.cgi?id=14824 LLVM Commits discussion: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130311/167936.html http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130318/168688.html http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130325/169376.html http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130401/170238.html llvm-svn: 178851	2013-04-05 05:52:14 +00:00
Andrew Trick	6da34cd35e	RegisterPressure heuristics currently require signed comparisons. llvm-svn: 178823	2013-04-05 00:31:34 +00:00
Hal Finkel	1dc78e666b	PPC: Improve code generation for mixed-precision reciprocal sqrt The DAGCombine logic that recognized a/sqrt(b) and transformed it into a multiplication by the reciprocal sqrt did not handle cases where the sqrt and the division were separated by an fpext or fptrunc. llvm-svn: 178801	2013-04-04 22:44:12 +00:00
Jakob Stoklund Olesen	a53fa8d450	Avoid high-latency false CPSR dependencies even for tMOVSi. The Thumb2SizeReduction pass avoids false CPSR dependencies, except it still aggressively creates tMOVi8 instructions because they are so common. Avoid creating false CPSR dependencies even for tMOVi8 instructions when the the CPSR flags are known to have high latency. This allows integer computation to overlap floating point computations. Also process blocks in a reverse post-order and propagate high-latency flags to successors. <rdar://problem/13468102> llvm-svn: 178773	2013-04-04 18:25:36 +00:00
Stepan Dyatkovskiy	0562afa331	New-password-test commit. llvm-svn: 178765	2013-04-04 16:11:18 +00:00
Vincent Lejeune	a680946842	R600: Take export into account when computing cf address llvm-svn: 178761	2013-04-04 13:59:59 +00:00
Jakob Stoklund Olesen	1969a96fcd	Add SPARC v9 support for select on 64-bit compares. This requires v9 cmov instructions using the %xcc flags instead of the %icc flags. Still missing: - Select floats on %xcc flags. - Select i64 on %fcc flags. llvm-svn: 178737	2013-04-04 03:08:00 +00:00
Vincent Lejeune	6a4ef74f44	R600: Fix last ALU of a clause being emitted in a separate clause llvm-svn: 178675	2013-04-03 18:24:47 +00:00
Bill Schmidt	990515e4c4	Fix PR15632: No support for ppcf128 floating-point remainder on PowerPC. For this we need to use a libcall. Previously LLVM didn't implement libcall support for frem, so I've added it in the usual straightforward manner. A test case from the bug report is included. llvm-svn: 178639	2013-04-03 13:05:44 +00:00
Timur Iskhodzhanov	0976f711d6	Temporarily relax the WIN32 checks in the SRet test to fix the Atom D2700 bot llvm-svn: 178635	2013-04-03 12:17:15 +00:00
Timur Iskhodzhanov	ecd533f0ec	Fix SRet for thiscall in i686-pc-win32 llvm-svn: 178634	2013-04-03 11:27:54 +00:00
Jakob Stoklund Olesen	3b7eaf9bb6	Add 64-bit compare + branch for SPARC v9. The same compare instruction is used for 32-bit and 64-bit compares. It sets two different sets of flags: icc and xcc. This patch adds a conditional branch instruction using the xcc flags for 64-bit compares. llvm-svn: 178621	2013-04-03 04:41:44 +00:00
Hal Finkel	0208f7c744	Use PPC reciprocal estimates with Newton iteration in fast-math mode When unsafe FP math operations are enabled, we can use the fre[s] and frsqrte[s] instructions, which generate reciprocal (sqrt) estimates, together with some Newton iteration, in order to quickly generate floating-point division and sqrt results. All of these instructions are separately optional, and so each has its own feature flag (except for the Altivec instructions, which are covered under the existing Altivec flag). Doing this is not only faster than using the IEEE-compliant fdiv/fsqrt instructions, but allows these computations to be pipelined with other computations in order to hide their overall latency. I've also added a couple of missing fnmsub patterns which turned out to be missing (but are necessary for good code generation of the Newton iterations). Altivec needs a similar fix, but that will probably be more complicated because fneg is expanded for Altivec's v4f32. llvm-svn: 178617	2013-04-03 04:01:11 +00:00
Akira Hatanaka	f08d3a5a83	[mips] Small update to the implementation of eh.return for Mips. This patch initializes t9 to the handler address, but only if the relocation model is pic. This handles the case where handler to which eh.return jumps points to the start of the function. Patch by Sasa Stankovic. llvm-svn: 178588	2013-04-02 23:02:07 +00:00
NAKAMURA Takumi	d8a9117bcb	llvm/test/CodeGen/X86: Unmark them out of XFAIL:cygming, in atomic{32\|64}.ll and handle-move.ll, corresponding to r178549. This reverts r176808, r176798, and r177914. llvm-svn: 178583	2013-04-02 22:35:08 +00:00
Bill Schmidt	c98ed219d3	Fix PR15630: Replace faulty stdcx. with stwcx. When doing a partword atomic operation, a lwarx was being paired with a stdcx. instead of a stwcx. when compiling for a 64-bit target. The target has nothing to do with it in this case; we always need a stwcx. Thanks to Kai Nacke for reporting the problem. llvm-svn: 178559	2013-04-02 18:37:08 +00:00
Jakob Stoklund Olesen	b0a5a72daf	Don't attempt MTM heuristics without a scheduling model present. This should fix the PPC buildbots. llvm-svn: 178558	2013-04-02 18:26:45 +00:00
Chad Rosier	908153170e	[fast-isel] Use the correct API to disable FastLowerArguments for Win64. llvm-svn: 178549	2013-04-02 16:31:41 +00:00
Arnold Schwaighofer	c9508f817a	DAGCombiner: Merge store/loads when we have extload/truncstores This is helps on architectures where i8,i16 are not legal but we have byte, and short loads/stores. Allowing us to merge copies like the one below on ARM. copy(char a, char b, int n) { do { int t0 = a[0]; int t1 = a[1]; b[0] = t0; b[1] = t1; radar://13536387 llvm-svn: 178546	2013-04-02 15:58:51 +00:00
Preston Gurd	fca710bf70	Simplify test cases for Atom preferring call register indirect over call memory indirect (32 and 64 bit). llvm-svn: 178541	2013-04-02 14:25:06 +00:00
Jakob Stoklund Olesen	8a184a7fe4	Add 64-bit load and store instructions. There is only a few new instructions, the rest is handled with patterns. llvm-svn: 178528	2013-04-02 04:09:28 +00:00
Jakob Stoklund Olesen	22fe26207f	Basic 64-bit ALU operations. SPARC v9 extends all ALU instructions to 64 bits, so we simply need to add patterns to use them for both i32 and i64 values. llvm-svn: 178527	2013-04-02 04:09:23 +00:00
Jakob Stoklund Olesen	d57f9ab92f	Materialize 64-bit immediates. The last resort pattern produces 6 instructions, and there are still opportunities for materializing some immediates in fewer instructions. llvm-svn: 178526	2013-04-02 04:09:17 +00:00
Jakob Stoklund Olesen	5ef2195726	Add 64-bit shift instructions. SPARC v9 defines new 64-bit shift instructions. The 32-bit shift right instructions are still usable as zero and sign extensions. This adds new F3_Sr and F3_Si instruction formats that probably should be used for the 32-bit shifts as well. They don't really encode an simm13 field. llvm-svn: 178525	2013-04-02 04:09:12 +00:00
Jakob Stoklund Olesen	9fbfce2d11	Add support for 64-bit calling convention. This is far from complete, but it is enough to make it possible to write test cases using i64 arguments. Missing features: - Floating point arguments. - Receiving arguments on the stack. - Calls. llvm-svn: 178523	2013-04-02 04:09:02 +00:00
Vincent Lejeune	dc0e12bd5b	R600: Add support for native control flow llvm-svn: 178505	2013-04-01 21:48:05 +00:00
Vincent Lejeune	11918406b3	R600: Emit CF_ALU and use true kcache register. llvm-svn: 178503	2013-04-01 21:47:42 +00:00
Hal Finkel	9191cdb5f2	Fix a bad assert in PPCTargetLowering llvm-svn: 178489	2013-04-01 18:42:58 +00:00
Hal Finkel	9cd1e5e93c	Add triple to test/CodeGen/PowerPC/stfiwx-2 llvm-svn: 178486	2013-04-01 18:18:44 +00:00
Arnold Schwaighofer	a2a475a83d	Merge load/store sequences with adresses: base + index + offset We would also like to merge sequences that involve a variable index like in the example below. int index = *idx++ int i0 = c[index+0]; int i1 = c[index+1]; b[0] = i0; b[1] = i1; By extending the parsing of the base pointer to handle dags that contain a base, index, and offset we can handle examples like the one above. The dag for the code above will look something like: (load (i64 add (i64 copyfromreg %c) (i64 signextend (i8 load %index)))) (load (i64 add (i64 copyfromreg %c) (i64 signextend (i32 add (i32 signextend (i8 load %index)) (i32 1))))) The code that parses the tree ignores the intermediate sign extensions. However, if there is a sign extension it needs to be on all indexes. (load (i64 add (i64 copyfromreg %c) (i64 signextend (add (i8 load %index) (i8 1)))) vs (load (i64 add (i64 copyfromreg %c) (i64 signextend (i32 add (i32 signextend (i8 load %index)) (i32 1))))) radar://13536387 llvm-svn: 178483	2013-04-01 18:12:58 +00:00
Hal Finkel	f184647a53	Add more PPC floating-point conversion instructions The P7 and A2 have additional floating-point conversion instructions which allow a direct two-instruction sequence (plus load/store) to convert from all combinations (signed/unsigned i32/i64) <--> (float/double) (on previous cores, only some combinations were directly available). llvm-svn: 178480	2013-04-01 17:52:07 +00:00
Hal Finkel	55f144f923	Fix PowerPC/cttz.ll to specify a cpu (and use FileCheck) llvm-svn: 178472	2013-04-01 16:31:56 +00:00
Hal Finkel	9eed3ac928	Add the PPC popcntw instruction The popcntw instruction is available whenever the popcntd instruction is available, and performs a separate popcnt on the lower and upper 32-bits. Ignoring the high-order count, this can be used for the 32-bit input case (saving on the explicit zero extension otherwise required to use popcntd). llvm-svn: 178470	2013-04-01 15:58:15 +00:00
Benjamin Kramer	790bd5fb50	X86: Promote sitofp <8 x i16> to <8 x i32> when AVX is available. A vector sext + sitofp is a lot cheaper than 8 scalar conversions. llvm-svn: 178448	2013-03-31 12:49:15 +00:00
Hal Finkel	085f61160f	Add the PPC lfiwax instruction This instruction is available on modern PPC64 CPUs, and is now used to improve the SINT_TO_FP lowering (by eliminating the need for the separate sign extension instruction and decreasing the amount of needed stack space). llvm-svn: 178446	2013-03-31 10:12:51 +00:00
Hal Finkel	7bdfbd6570	Cleanup PPC(64) i32 -> float/double conversion The existing SINT_TO_FP code for i32 -> float/double conversion was disabled because it relied on broken EXTSW_32/STD_32 instruction definitions. The original intent had been to enable these 64-bit instructions to be used on CPUs that support them even in 32-bit mode. Unfortunately, this form of lying to the infrastructure was buggy (as explained in the FIXME comment) and had therefore been disabled. This re-enables this functionality, using regular DAG nodes, but only when compiling in 64-bit mode. The old STD_32/EXTSW_32 definitions (which were dead) are removed. llvm-svn: 178438	2013-03-31 01:58:02 +00:00
Benjamin Kramer	86e90ea8b4	DAGCombine: visitXOR can replace a node without returning it, bail out in that case. Fixes the crash reported in PR15608. llvm-svn: 178429	2013-03-30 21:28:18 +00:00
Benjamin Kramer	50725426cb	Change '@SECREL' suffix to GAS-compatible '@SECREL32'. '@SECREL' is what is used by the Microsoft assembler, but GNU as expects '@SECREL32'. With the patch, the MC-generated code works fine in combination with a recent GNU as (2.23.51.20120920 here). Patch by David Nadlinger! Differential Revision: http://llvm-reviews.chandlerc.com/D429 llvm-svn: 178427	2013-03-30 16:21:50 +00:00
Justin Holewinski	21480942b2	[NVPTX] Remove support for SM < 2.0. This was never fully supported anyway. llvm-svn: 178417	2013-03-30 14:29:30 +00:00
Justin Holewinski	23056edada	[NVPTX] Add NVVMReflect pass to allow compile-time selection of specific code paths. This allows us to write code like: if (__nvvm_reflect("FOO")) // Do something else // Do something else and compile into a library, then give "FOO" a value at kernel compile-time so the check becomes a no-op. llvm-svn: 178416	2013-03-30 14:29:25 +00:00
Akira Hatanaka	bc81d23802	[mips] Add patterns for DSP indexed load instructions. llvm-svn: 178408	2013-03-30 02:14:45 +00:00
Akira Hatanaka	5ff9493456	[mips] Fix DSP instructions to have explicit accumulator register operands. Check that instruction selection can select multiply-add/sub DSP instructions from a pattern that doesn't have intrinsics. llvm-svn: 178406	2013-03-30 01:58:00 +00:00
Akira Hatanaka	6c9ddf6943	[mips] Move the code which does dag-combine for multiply-add/sub nodes to derived class MipsSETargetLowering. We shouldn't be generating madd/msub nodes if target is Mips16, since Mips16 doesn't have support for multipy-add/sub instructions. llvm-svn: 178404	2013-03-30 01:42:24 +00:00
Timur Iskhodzhanov	d7d35221f7	Exclude the X86/complex-fca.ll test at it probably wasn't supposed to work on Windows llvm-svn: 178375	2013-03-29 21:54:00 +00:00
Hal Finkel	c14a74d243	Implement FRINT lowering on PPC using frin Like nearbyint, rint can be implemented on PPC using the frin instruction. The complication comes from the fact that rint needs to set the FE_INEXACT flag when the result does not equal the input value (and frin does not do that). As a result, we use a custom inserter which, after the rounding, compares the rounded value with the original, and if they differ, explicitly sets the XX bit in the FPSCR register (which corresponds to FE_INEXACT). Once LLVM has better modeling of the floating-point environment we should be able to (often) eliminate this extra complexity. llvm-svn: 178362	2013-03-29 19:41:55 +00:00
Benjamin Kramer	279e5cfa9a	Remove the old CodePlacementOpt pass. It was superseded by MachineBlockPlacement and disabled by default since LLVM 3.1. llvm-svn: 178349	2013-03-29 17:14:24 +00:00
Hal Finkel	3493fd8e51	Add PPC FP rounding instructions fri[mnpz] These instructions are available on the P5x (and later) and on the A2. They implement the standard floating-point rounding operations (floor, trunc, etc.). One caveat: frin (round to nearest) does not implement "ties to even", and so is only enabled in fast-math mode. llvm-svn: 178337	2013-03-29 08:57:48 +00:00
Michael Liao	427149cbcf	Add support of RDSEED defined in AVX2 extension llvm-svn: 178314	2013-03-28 23:41:26 +00:00
Michael Liao	aec693ab31	Enhance boolean simplification to handle 16-/64-bit RDRAND - RDRAND always clears the destination value when a random value is not available (i.e. CF == 0). This value is truncated or zero-extended as the false boolean value to be returned. Boolean simplification needs to skip this 'zext' or 'trunc' node. llvm-svn: 178312	2013-03-28 23:38:52 +00:00
Timur Iskhodzhanov	d7de83d51c	Make Win32 put the SRet address into EAX, fixes PR15556 llvm-svn: 178291	2013-03-28 21:30:04 +00:00
Hal Finkel	8672bd7c2a	Specify CPUs on the PPC bswap-load-store test Otherwise, the CHECK-NOT's might trigger depending on the host's CPU. llvm-svn: 178287	2013-03-28 20:35:18 +00:00
Hal Finkel	88670ad5f4	Only enable 64-bit bswap DAG combines for PPC64 Compiling in 32-bit mode on a P7 would assert after 64-bit DAG combines were added for bswap with load/store. This is because these combines are really only valid in 64-bit mode, regardless of the CPU (and this was not being checked). llvm-svn: 178286	2013-03-28 20:23:46 +00:00
Jyotsna Verma	f28cc49519	Hexagon: Enable SupportDebugInfomation and DwarfInSection flags. llvm-svn: 178279	2013-03-28 19:34:49 +00:00
Hal Finkel	f359927db6	Add the PPC64 ldbrx/stdbrx instructions These are 64-bit load/store with byte-swap, and available on the P7 and the A2. Like the similar instructions for 16- and 32-bit words, these are matched in the target DAG-combine phase against load/store-bswap pairs. llvm-svn: 178276	2013-03-28 19:25:55 +00:00
Jyotsna Verma	8a524534a6	Hexagon: Use multiclass for gp-relative instructions. Remove noV4T gp-relative instructions. llvm-svn: 178246	2013-03-28 16:25:57 +00:00
Hal Finkel	c21c3cf09e	Add the PPC64 popcntd instruction PPC ISA 2.06 (P7, A2, etc.) has a popcntd instruction. Add this instruction and tell TTI about it so that popcount-loop recognition will know about it. llvm-svn: 178233	2013-03-28 13:29:47 +00:00
Hal Finkel	4d8aed70c1	Cleanup PPC CR-spill kill flags and 32- vs. 64-bit instructions There were a few places where kill flags were not being set correctly, and where 32-bit instruction variants were being used with 64-bit registers. After r178180, this code was being triggered causing llc to assert. llvm-svn: 178220	2013-03-28 03:38:16 +00:00
David Blaikie	377434ec76	Revert "Adding DIImportedModules to DIScopes." This reverts commit 342d92c7a0adeabc9ab00f3f0d88d739fe7da4c7. Turns out we're going with a different schema design to represent DW_TAG_imported_modules so we won't need this extra field. llvm-svn: 178215	2013-03-28 02:44:59 +00:00
Preston Gurd	787c145b5f	This patch follows is a follow up to r178171, which uses the register form of call in preference to memory indirect on Atom. In this case, the patch applies the optimization to the code for reloading spilled registers. The patch also includes changes to sibcall.ll and movgs.ll, which were failing on the Atom buildbot after the first patch was applied. This patch by Sriram Murali. llvm-svn: 178193	2013-03-27 23:16:18 +00:00
Preston Gurd	b6ed645cb6	For the current Atom processor, the fastest way to handle a call indirect through a memory address is to load the memory address into a register and then call indirect through the register. This patch implements this improvement by modifying SelectionDAG to force a function address which is a memory reference to be loaded into a virtual register. Patch by Sriram Murali. llvm-svn: 178171	2013-03-27 19:14:02 +00:00
Christian Konig	510c335233	R600/SI: add SETO/SETUO patterns 6 more piglit tests. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> llvm-svn: 178145	2013-03-27 15:27:31 +00:00
Hal Finkel	937268691d	Print PPC ZERO as 0 (not r0) even on Darwin It seems that the Darwin PPC assembler requires r0 to be written as 0 when it means 0 (at least in lwarx/stwcx.). Fixes PR15605. llvm-svn: 178142	2013-03-27 13:20:52 +00:00
Silviu Baranga	bd61af84a7	Enabling the generation of dependency breakers for partial updates on Cortex-A15. Also fixing a small bug in getting the update clearence for VLD1LNd32. llvm-svn: 178134	2013-03-27 12:38:44 +00:00
Christian Konig	fb305cbcea	R600/SI: add cummuting of rev instructions Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Tested-by: Michel Dänzer <michel.daenzer@amd.com> llvm-svn: 178127	2013-03-27 09:12:59 +00:00

... 3 4 5 6 7 ...

7574 Commits