llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-23 19:23:23 +01:00

Author	SHA1	Message	Date
Chad Rosier	d9dca3aef3	[arm-fast-isel] Add support for non-global callee. Patch by Jush Lu <jush.msn@gmail.com>. llvm-svn: 157336	2012-05-23 18:38:57 +00:00
Nuno Lopes	944814b41a	revert my previous patches that introduced an additional parameter to the objectsize intrinsic. After a lot of discussion, we realized it's not the best option for run-time bounds checking llvm-svn: 157255	2012-05-22 15:25:31 +00:00
Jakob Stoklund Olesen	3bee83c3a5	Transfer memory operands to the right instruction. They need to go on the PICLDR as the verifier points out. llvm-svn: 157151	2012-05-20 06:38:42 +00:00
Jim Grosbach	343a996ca5	Refactor data-in-code annotations. Use a dedicated MachO load command to annotate data-in-code regions. This is the same format the linker produces for final executable images, allowing consistency of representation and use of introspection tools for both object and executable files. Data-in-code regions are annotated via ".data_region"/".end_data_region" directive pairs, with an optional region type. data_region_directive := ".data_region" { region_type } region_type := "jt8" \| "jt16" \| "jt32" \| "jta32" end_data_region_directive := ".end_data_region" The previous handling of ARM-style "$d.*" labels was broken and has been removed. Specifically, it didn't handle ARM vs. Thumb mode when marking the end of the section. rdar://11459456 llvm-svn: 157062	2012-05-18 19:12:01 +00:00
Tim Northover	fc7737e7dc	Remove incorrect pattern for ARM SMML instruction. Patch by Meador Inge. llvm-svn: 156989	2012-05-17 13:12:13 +00:00
Jakob Stoklund Olesen	1da786c936	Enable sub-sub-register copy coalescing. It is now possible to coalesce weird skewed sub-register copies by picking a super-register class larger than both original registers. The included test case produces code like this: vld2.32 {d16, d17, d18, d19}, [r0]! vst2.32 {d18, d19, d20, d21}, [r0] We still perform interference checking as if it were a normal full copy join, so this is still quite conservative. In particular, the f1 and f2 functions in the included test case still have remaining copies because of false interference. llvm-svn: 156878	2012-05-15 23:31:35 +00:00
Chad Rosier	4a65a2a197	[fast-isel] Add support for selecting @llvm.trap(). llvm-svn: 156646	2012-05-11 21:33:49 +00:00
Chad Rosier	72bd34ca71	[fast-isel] Remove -disable-arm-fast-isel option. -fast-isel=0 suffices. Minor cleanup. llvm-svn: 156632	2012-05-11 19:40:25 +00:00
Chad Rosier	c20de37076	[fast-isel] Cleaner fix for when we're unable to handle a non-double multi-reg retval. Hoists check before emitting the call to avoid unnecessary work. rdar://11430407 PR12796 llvm-svn: 156628	2012-05-11 18:51:55 +00:00
Manman Ren	c82d0e71b9	ARM: peephole optimization to remove cmp instruction This patch will optimize the following cases: sub r1, r3 \| sub r1, imm cmp r3, r1 or cmp r1, r3 \| cmp r1, imm bge L1 TO subs r1, r3 bge L1 or ble L1 If the branch instruction can use flag from "sub", then we can replace "sub" with "subs" and eliminate the "cmp" instruction. rdar: 10734411 llvm-svn: 156599	2012-05-11 01:30:47 +00:00
Manman Ren	5abcae1320	Revert: 156550 "ARM: peephole optimization to remove cmp instruction" This commit broke an external linux bot and gave a compile-time warning. llvm-svn: 156556	2012-05-10 18:49:43 +00:00
Manman Ren	727b7d5e4c	ARM: peephole optimization to remove cmp instruction This patch will optimize the following cases: sub r1, r3 \| sub r1, imm cmp r3, r1 or cmp r1, r3 \| cmp r1, imm bge L1 TO subs r1, r3 bge L1 or ble L1 If the branch instruction can use flag from "sub", then we can replace "sub" with "subs" and eliminate the "cmp" instruction. rdar: 10734411 llvm-svn: 156550	2012-05-10 16:48:21 +00:00
Nuno Lopes	e8880a9916	change the objectsize intrinsic signature: add a 3rd parameter to denote the maximum runtime performance penalty that the user is willing to accept. This commit only adds the parameter. Code taking advantage of it will follow. llvm-svn: 156473	2012-05-09 15:52:43 +00:00
Owen Anderson	8adb0322ce	Teach DAG combine to fold x-x to 0.0 when unsafe FP math is enabled. llvm-svn: 156324	2012-05-07 20:51:25 +00:00
Owen Anderson	e3d41b44cc	Teach DAGCombine the same multiply-by-1.0 folding trick when doing FMAs, just like it now knows for FMULs. llvm-svn: 156029	2012-05-02 22:17:40 +00:00
Owen Anderson	3cfe269707	Teach DAG combine that multiplication by 1.0 can always be constant folded. llvm-svn: 156023	2012-05-02 21:32:35 +00:00
Bob Wilson	d2f6ff588b	Don't introduce illegal types when creating vmull operations. <rdar://11324364> ARM BUILD_VECTORs created after type legalization cannot use i8 or i16 operands, since those types are not legal. Instead use i32 operands, which will be implicitly truncated by the BUILD_VECTOR to match the element type. llvm-svn: 155824	2012-04-30 16:53:34 +00:00
Lang Hames	7d83af4ed0	Fix the order of the operands in the llvm.fma intrinsic patterns for ARM, <rdar://problem/11325085>. llvm-svn: 155724	2012-04-27 18:51:24 +00:00
Evan Cheng	f35523d08a	Implement a bastardized ABI. llvm-svn: 155686	2012-04-27 02:11:10 +00:00
Tim Northover	876c151146	Use VLD1 in NEON extenting-load patterns instead of VLDR. On some cores it's a bad idea for performance to mix VFP and NEON instructions and since these patterns are NEON anyway, the NEON load should be used. llvm-svn: 155630	2012-04-26 08:46:29 +00:00
Evan Cheng	7f9bf43bcf	MachineBasicBlock::SplitCriticalEdge() should follow LLVM IR variant and refuse to break edge to EH landing pad. rdar://11300144 llvm-svn: 155470	2012-04-24 19:06:55 +00:00
James Molloy	44927f5296	Fix bad EXTRACT_SUBREG in instruction selection for extending-loads on NEON. llvm-svn: 154915	2012-04-17 08:18:00 +00:00
Jakob Stoklund Olesen	8bea8a2373	FileCheckize these tests. Add an extra test to ldr_post with an immediate increment. llvm-svn: 154859	2012-04-16 20:56:42 +00:00
Jakob Stoklund Olesen	8d6641a2b2	Disable code placement for this test. It makes it less sensitive to small changes in heuristics. llvm-svn: 154857	2012-04-16 20:49:06 +00:00
Chandler Carruth	728acc9bd9	Flip the new block-placement pass to be on by default. This is mostly to test the waters. I'd like to get results from FNT build bots and other bots running on non-x86 platforms. This feature has been pretty heavily tested over the last few months by me, and it fixes several of the execution time regressions caused by the inlining work by preventing inlining decisions from radically impacting block layout. I've seen very large improvements in yacr2 and ackermann benchmarks, along with the expected noise across all of the benchmark suite whenever code layout changes. I've analyzed all of the regressions and fixed them, or found them to be impossible to fix. See my email to llvmdev for more details. I'd like for this to be in 3.1 as it complements the inliner changes, but if any failures are showing up or anyone has concerns, it is just a flag flip and so can be easily turned off. I'm switching it on tonight to try and get at least one run through various folks' performance suites in case SPEC or something else has serious issues with it. I'll watch bots and revert if anything shows up. llvm-svn: 154816	2012-04-16 13:49:17 +00:00
Evan Cheng	3499593c7e	On Darwin targets, only use vfma etc. if the source use fma() intrinsic explicitly. llvm-svn: 154689	2012-04-13 18:59:28 +00:00
Evan Cheng	f138fb4599	Add more fused mul+add/sub patterns. rdar://10139676 llvm-svn: 154484	2012-04-11 06:59:47 +00:00
Evan Cheng	b5291aea18	Match (fneg (fma) to vfnma. rdar://10139676 llvm-svn: 154469	2012-04-11 01:21:25 +00:00
Evan Cheng	eaf8eba8c4	Merge fma.ll into fusedMAC.ll llvm-svn: 154466	2012-04-11 01:03:11 +00:00
Owen Anderson	a8319713a4	Move the constant-folding support for FP_ROUND in SelectionDAG from the one-operand version of getNode() to the two-operand version, since it became a two-operand node at sound point. Zap a testcase that this allows us to completely fold away. llvm-svn: 154447	2012-04-10 22:46:53 +00:00
Evan Cheng	f9617f7f54	Handle llvm.fma.* intrinsics. rdar://10914096 llvm-svn: 154439	2012-04-10 21:40:28 +00:00
Eric Christopher	ec1405e930	To ensure that we have more accurate line information for a block don't elide the branch instruction if it's the only one in the block, otherwise it's ok. PR9796 and rdar://11215207 llvm-svn: 154417	2012-04-10 18:18:10 +00:00
Anton Korobeynikov	0fc5fe0430	Transform div to mul with reciprocal only when fp imm is legal. This fixes PR12516 and uncovers one weird problem in legalize (workarounded) llvm-svn: 154394	2012-04-10 13:22:49 +00:00
Evan Cheng	d9ff163215	Add proper checks. llvm-svn: 154379	2012-04-10 03:15:42 +00:00
Evan Cheng	5825e9dbf5	Fix a long standing tail call optimization bug. When a libcall is emitted legalizer always use the DAG entry node. This is wrong when the libcall is emitted as a tail call since it effectively folds the return node. If the return node's input chain is not the entry (i.e. call, load, or store) use that as the tail call input chain. PR12419 rdar://9770785 rdar://11195178 llvm-svn: 154370	2012-04-10 01:51:00 +00:00
Chad Rosier	a588421976	When performing a truncating store, it's possible to rearrange the data in-register, such that we can use a single vector store rather then a series of scalar stores. For func_4_8 the generated code vldr d16, LCPI0_0 vmov d17, r0, r1 vadd.i16 d16, d17, d16 vmov.u16 r0, d16[3] strb r0, [r2, #3] vmov.u16 r0, d16[2] strb r0, [r2, #2] vmov.u16 r0, d16[1] strb r0, [r2, #1] vmov.u16 r0, d16[0] strb r0, [r2] bx lr becomes vldr d16, LCPI0_0 vmov d17, r0, r1 vadd.i16 d16, d17, d16 vuzp.8 d16, d17 vst1.32 {d16[0]}, [r2, :32] bx lr I'm not fond of how this combine pessimizes 2012-03-13-DAGCombineBug.ll, but I couldn't think of a way to judiciously apply this combine. This ldrh r0, [r0, #4] strh r0, [r1] becomes vldr d16, [r0] vmov.u16 r0, d16[2] vmov.32 d16[0], r0 vuzp.16 d16, d17 vst1.32 {d16[0]}, [r1, :32] PR11158 rdar://10703339 llvm-svn: 154340	2012-04-09 20:32:02 +00:00
Duncan Sands	cd52f3d447	Convert floating point division by a constant into multiplication by the reciprocal if converting to the reciprocal is exact. Do it even if inexact if -ffast-math. This substantially speeds up ac.f90 from the polyhedron benchmarks. llvm-svn: 154265	2012-04-07 20:04:00 +00:00
Jakob Stoklund Olesen	bb7b631def	Allow negative immediates in ARM and Thumb2 compares. ARM and Thumb2 mode can use cmn instructions to compare against negative immediates. Thumb1 mode can't. llvm-svn: 154183	2012-04-06 17:45:04 +00:00
James Molloy	3604b95957	An oversight when applying the patches for r150956 and r150957 to a vanilla tree meant I forgot to svn add these testcases. Noticed while investigating PR12274! llvm-svn: 154090	2012-04-05 10:01:12 +00:00
Jakob Stoklund Olesen	e1ae4f161c	Pass the right sign to TLI->isLegalICmpImmediate. LSR can fold three addressing modes into its ICmpZero node: ICmpZero BaseReg + Offset => ICmp BaseReg, -Offset ICmpZero -1ScaleReg + Offset => ICmp ScaleReg, Offset ICmpZero BaseReg + -1ScaleReg => ICmp BaseReg, ScaleReg The first two cases are only used if TLI->isLegalICmpImmediate() likes the offset. Make sure the right Offset sign is passed to this method in the second case. The ARM version is not symmetric. <rdar://problem/11184260> llvm-svn: 154079	2012-04-05 03:10:56 +00:00
Jakob Stoklund Olesen	0419ed395c	Implement ARMBaseInstrInfo::commuteInstruction() for MOVCCr. A MOVCCr instruction can be commuted by inverting the condition. This can help reduce register pressure and remove unnecessary copies in some cases. <rdar://problem/11182914> llvm-svn: 154033	2012-04-04 18:23:42 +00:00
Jakob Stoklund Olesen	97f47c37b6	Allocate virtual registers in ascending order. This is just the fallback tie-breaker ordering, the main allocation order is still descending size. Patch by Shamil Kurmangaleev! llvm-svn: 153904	2012-04-02 22:30:39 +00:00
Lang Hames	dbc3175c89	During two-address lowering, rescheduling an instruction does not untie operands. Make TryInstructionTransform return false to reflect this. Fixes PR11861. llvm-svn: 153892	2012-04-02 19:58:43 +00:00
Nadav Rotem	2729f54295	This commit contains a few changes that had to go in together. 1. Simplify xor/and/or (bitcast(A), bitcast(B)) -> bitcast(op (A,B)) (and also scalar_to_vector). 2. Xor/and/or are indifferent to the swizzle operation (shuffle of one src). Simplify xor/and/or (shuff(A), shuff(B)) -> shuff(op (A, B)) 3. Optimize swizzles of shuffles: shuff(shuff(x, y), undef) -> shuff(x, y). 4. Fix an X86ISelLowering optimization which was very bitcast-sensitive. Code which was previously compiled to this: movd (%rsi), %xmm0 movdqa .LCPI0_0(%rip), %xmm2 pshufb %xmm2, %xmm0 movd (%rdi), %xmm1 pshufb %xmm2, %xmm1 pxor %xmm0, %xmm1 pshufb .LCPI0_1(%rip), %xmm1 movd %xmm1, (%rdi) ret Now compiles to this: movl (%rsi), %eax xorl %eax, (%rdi) ret llvm-svn: 153848	2012-04-01 19:31:22 +00:00
Evan Cheng	f3c23907f5	ARM target should allow codegenprep to duplicate ret instructions to enable tailcall opt. rdar://11140249 llvm-svn: 153717	2012-03-30 01:24:39 +00:00
Lang Hames	5685c98688	Change the constant in this testcase so that it results in a constant pool load. llvm-svn: 153704	2012-03-29 23:52:38 +00:00
Evan Cheng	72cf12e416	ARM has a peephole optimization which looks for a def / use pair. The def produces a 32-bit immediate which is consumed by the use. It tries to fold the immediate by breaking it into two parts and fold them into the immmediate fields of two uses. e.g movw r2, #40885 movt r3, #46540 add r0, r0, r3 => add.w r0, r0, #3019898880 add.w r0, r0, #30146560 ; However, this transformation is incorrect if the user produces a flag. e.g. movw r2, #40885 movt r3, #46540 adds r0, r0, r3 => add.w r0, r0, #3019898880 adds.w r0, r0, #30146560 Note the adds.w may not set the carry flag even if the original sequence would. rdar://11116189 llvm-svn: 153484	2012-03-26 23:31:00 +00:00
Eli Bendersky	3ef88c1833	Continue cleanup of LIT, getting rid of the remaining artifacts from dejagnu * Removed test/lib/llvm.exp - it is no longer needed * Deleted the dg.exp reading code from test/lit.cfg. There are no dg.exp files left in the test suite so this code is no longer required. test/lit.cfg is now much shorter and clearer * Removed a lot of duplicate code in lit.local.cfg files that need access to the root configuration, by adding a "root" attribute to the TestingConfig object. This attribute is dynamically computed to provide the same information as was previously provided by the custom getRoot functions. * Documented the config.root attribute in docs/CommandGuide/lit.pod llvm-svn: 153408	2012-03-25 09:02:19 +00:00
Andrew Trick	121e3e153c	Remove -enable-lsr-nested in time for 3.1. Tests cases have been removed but attached to open PR12330. llvm-svn: 153286	2012-03-22 22:42:45 +00:00
Chad Rosier	ce3a78be60	[fast-isel] Fold "urem x, pow2" -> "and x, pow2-1". This should fix the 271% execution-time regression for nsieve-bits on the ARMv7 -O0 -g nightly tester. This may also improve compile-time on architectures that would otherwise generate a libcall for urem (e.g., ARM) or fall back to the DAG selector. rdar://10810716 llvm-svn: 153230	2012-03-22 00:21:17 +00:00

1 2 3 4 5 ...

1277 Commits