llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-27 14:02:50 +01:00

Author	SHA1	Message	Date
Benjamin Kramer	c65f53aca8	[BranchProbability] Manually round the floating point output. llvm::format compiles down to snprintf which has no defined rounding for floating point arguments, and MSVC has implemented it differently from what the BSD libcs and glibc do. Try to emulate the glibc rounding behavior to avoid changing tests. While there simplify code a bit and move trivial methods inline. llvm-svn: 248665	2015-09-26 10:09:36 +00:00
Matt Arsenault	fb1ff93ba4	AMDGPU: Remove hasPostISelHook from most instructions Since this is only needed for VOP3 and a few other special case instructions, stop setting it on everything. llvm-svn: 248657	2015-09-26 05:06:48 +00:00
Matt Arsenault	16b445f6b4	AMDGPU: Switch over reg class size instead of checking all super classes This gets isSGPRClass out of my profile of SIFixSGPRCopies. llvm-svn: 248656	2015-09-26 04:59:04 +00:00
Matt Arsenault	21b183d12c	AMDGPU: Don't handle invalid reg classes in helper functions No tests hit these and it would be better to have checks like this explicit where they are used. llvm-svn: 248655	2015-09-26 04:53:30 +00:00
Saleem Abdulrasool	167c693a73	AMDGPU: address -Winconsistent-missing-override Add missing override. NFC. llvm-svn: 248652	2015-09-26 04:34:52 +00:00
Matt Arsenault	8248804482	AMDGPU: Set CopyCost of register classes These require multiple mov instructions to copy, but the default value is that 1 instruction is needed. I'm not sure if this actually changes anything. llvm-svn: 248651	2015-09-26 04:09:34 +00:00
Chen Li	11db69b00f	[Bug 24848] Use range metadata to constant fold comparisons between two values Summary: This is the second part of fixing bug 24848 https://llvm.org/bugs/show_bug.cgi?id=24848. If both operands of a comparison have range metadata, they should be used to constant fold the comparison. Reviewers: sanjoy, hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13177 llvm-svn: 248650	2015-09-26 03:26:47 +00:00
Matt Arsenault	eb0d6b9ea5	AMDGPU: VOP3b definition cleanups llvm-svn: 248647	2015-09-26 02:25:48 +00:00
Matt Arsenault	8a568ce423	AMDGPU: Fix sched model for VOP2b instructions Trying to use the version with the explicit output operand would complain because of the missing WriteSALU. I'm not sure why it doesn't complain about this with the implicit VCC def. llvm-svn: 248646	2015-09-26 02:25:45 +00:00
Dan Gohman	33947eb625	[WebAssembly] Rename several functions and types according to the new spec. llvm-svn: 248644	2015-09-26 01:09:44 +00:00
Ahmed Bougacha	5e8f57b519	[ARM] Don't generate clrex for pre-v7 targets. Since r248294, we emit clrex, but it doesn't exist on v6. llvm-svn: 248640	2015-09-26 00:14:02 +00:00
Sanjoy Das	a43b643107	[SCEV] Reapply 'Teach isLoopBackedgeGuardedByCond to exploit trip counts' Summary: If the trip count of a specific backedge is `N`, then we know that backedge is effectively guarded by the condition `{0,+,1} u< N`. This change teaches SCEV to use this condition to prove things in `isLoopBackedgeGuardedByCond`. Depends on D12948 Depends on D12949 The original checkin, r248608 had to be backed out due to an issue with a ObjCXX unit test. That issue is now fixed, so re-landing. Reviewers: atrick, reames, majnemer, hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12950 llvm-svn: 248638	2015-09-25 23:53:50 +00:00
Sanjoy Das	e7714ded02	[SCEV] Reapply 'Exploit A < B => (A+K) < (B+K) when possible' Summary: This change teaches SCEV's `isImpliedCond` two new identities: A u< B u< -C => (A + C) u< (B + C) A s< B s< INT_MIN - C => (A + C) s< (B + C) While these are useful on their own, they're really intended to support D12950. The original checkin, r248606 had to be backed out due to an issue with a ObjCXX unit test. That issue is now fixed, so re-landing. Reviewers: atrick, reames, majnemer, nlewycky, hfinkel Subscribers: aadg, sanjoy, llvm-commits Differential Revision: http://reviews.llvm.org/D12948 llvm-svn: 248637	2015-09-25 23:53:45 +00:00
Matthias Braun	ad833f6380	LivePhysRegs: Fix live-outs of return blocks I realized that the live-out set computed for the return block is missing the callee saved registers (the non-pristine ones to be exact). This only affects the liveness computed for instructions inside the function epilogue which currently none of the LivePhysRegs users in llvm cares about, so this is just a drive-by fix without a testcase. Differential Revision: http://reviews.llvm.org/D13180 llvm-svn: 248636	2015-09-25 23:50:53 +00:00
Sanjay Patel	61ecb7fad2	[InstCombine] match De Morgan's Law hidden by zext ops (PR22723) This is a fix for PR22723: https://llvm.org/bugs/show_bug.cgi?id=22723 My first attempt at this was to change what I thought was the root problem: xor (zext i1 X to i32), 1 --> zext (xor i1 X, true) to i32 ...but we create the opposite pattern in InstCombiner::visitZExt(), so infinite loop! My next idea was to fix the matchIfNot() implementation in PatternMatch, but that would mean potentially returning a different size for the match than what was input. I think this would require all users of m_Not to check the size of the returned match, so I abandoned that idea. I settled on just fixing the exact case presented in the PR. This patch does allow the 2 functions in PR22723 to compile identically (x86): bool test(bool x, bool y) { return !x \| !y; } bool test(bool x, bool y) { return !x \|\| !y; } ... andb %sil, %dil xorb $1, %dil movb %dil, %al retq Differential Revision: http://reviews.llvm.org/D12705 llvm-svn: 248634	2015-09-25 23:21:38 +00:00
Cong Hou	3919ffc012	Use fixed-point representation for BranchProbability. BranchProbability now is represented by its numerator and denominator in uint32_t type. This patch changes this representation into a fixed point that is represented by the numerator in uint32_t type and a constant denominator 1<<31. This is quite similar to the representation of BlockMass in BlockFrequencyInfoImpl.h. There are several pros and cons of this change: Pros: 1. It uses only a half space of the current one. 2. Some operations are much faster like plus, subtraction, comparison, and scaling by an integer. Cons: 1. Constructing a probability using arbitrary numerator and denominator needs additional calculations. 2. It is a little less precise than before as we use a fixed denominator. For example, 1 - 1/3 may not be exactly identical to 1 / 3 (this will lead to many BranchProbability unit test failures). This should not matter when we only use it for branch probability. If we use it like a rational value for some precise calculations we may need another construct like ValueRatio. One important reason for this change is that we propose to store branch probabilities instead of edge weights in MachineBasicBlock. We also want clients to use probability instead of weight when adding successors to a MBB. The current BranchProbability has more space which may be a concern. Differential revision: http://reviews.llvm.org/D12603 llvm-svn: 248633	2015-09-25 23:09:59 +00:00
Matthias Braun	1e4160c6f7	SelectionDAGDumper: Print simple operands inline. Print simple operands inline instead of their pointer/value number. Simple operands are SDNodes without predecessors like Constant(FP), Register, UNDEF. This unifies the behaviour with dumpr() which was already doing this. Previously: t0: ch = EntryToken t1: i64 = Register %vreg0 t2: i64,ch = CopyFromReg t0, t1 t3: i64 = Constant<1> t4: i64 = add t2, t3 t5: i64 = Constant<2> t6: i64 = add t2, t5 t10: i64 = undef t11: i8,ch = load t0, t2, t10<LD1[%tmp81]> t12: i8,ch = load t0, t4, t10<LD1[%tmp10]> t13: i8,ch = load t0, t6, t10<LD1[%tmp12]> Now: t0: ch = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0 t4: i64 = add t2, Constant:i64<1> t6: i64 = add t2, Constant:i64<2> t11: i8,ch = load<LD1[%tmp81]> t0, t2, undef:i64 t12: i8,ch = load<LD1[%tmp10]> t0, t4, undef:i64 t13: i8,ch = load<LD1[%tmp12]> t0, t6, undef:i64 Differential Revision: http://reviews.llvm.org/D12567 llvm-svn: 248628	2015-09-25 22:27:02 +00:00
Matt Arsenault	04fdcb1cfc	AMDGPU: Construct new buffer instruction when moving SMRD It's easier to understand creating a full instruction than the current situation where sometimes a new instruction is created and sometimes it is awkwardly mutated in place. llvm-svn: 248627	2015-09-25 22:21:19 +00:00
Matt Arsenault	30ba1a00e6	DAGCombiner: Check if store is volatile first This is the simpler check. NFC. llvm-svn: 248625	2015-09-25 22:06:19 +00:00
Matthias Braun	18d6f29c07	TargetRegisterInfo: Introduce PrintLaneMask. This makes it more convenient to print lane masks and lead to more uniform printing. llvm-svn: 248624	2015-09-25 21:51:24 +00:00
Matthias Braun	744bb44288	TargetRegisterInfo: Add typedef unsigned LaneBitmask and use it where apropriate; NFC llvm-svn: 248623	2015-09-25 21:51:14 +00:00
Sanjay Patel	33bcd3de54	merge vector stores into wider vector stores and fix AArch64 misaligned access TLI hook (PR21711) This is a redo of D7208 ( r227242 - http://llvm.org/viewvc/llvm-project?view=revision&revision=227242 ). The patch was reverted because an AArch64 target could infinite loop after the change in DAGCombiner to merge vector stores. That happened because AArch64's allowsMisalignedMemoryAccesses() wasn't telling the truth. It reported all unaligned memory accesses as fast, but then split some 128-bit unaligned accesses up in performSTORECombine() because they are slow. This patch attempts to fix the problem in AArch's allowsMisalignedMemoryAccesses() while preserving existing (perhaps questionable) lowering behavior. The x86 test shows that store merging is working as intended for a target with fast 32-byte unaligned stores. Differential Revision: http://reviews.llvm.org/D12635 llvm-svn: 248622	2015-09-25 21:49:48 +00:00
Matthias Braun	913f3edce2	PrologueEpilogInserter: Fix missing live-ins when savepoint equals restorepoint The algorithm would not modify the live-in list of blocks below the save block point which is correct unless it happens to be a restore point at the same time. Also fixes the benign issue of live-in registers being added twice in some cases. The testcase is based on a test submitted by Kit Barton. Differential Revision: http://reviews.llvm.org/D13176 llvm-svn: 248620	2015-09-25 21:41:40 +00:00
Tom Stellard	c6bc4ec163	AMDGPU/SI: Use .hsatext section instead of .text for HSA Reviewers: arsenm, grosbach, rafael Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D12424 llvm-svn: 248619	2015-09-25 21:41:28 +00:00
Tom Stellard	b419e6733b	MCAsmInfo: Allow targets to specify when the .section directive should be omitted Summary: The default behavior is to omit the .section directive for .text, .data, and sometimes .bss, but some targets may want to omit this directive for other sections too. The AMDGPU backend will uses this to emit a simplified syntax for section switches. For example if the section directive is not omitted (current behavior), section switches to .hsatext will be printed like this: .section .hsatext,#alloc,#execinstr,#write This is actually wrong, because .hsatext has some custom STT_* flags, which MC doesn't know how to print or parse. If the section directive is omitted (made possible by this commit), section switches will be printed like this: .hsatext The motivation for this patch is to make it possible to emit sections with custom STT_* flags without having to teach MC about all the target specific STT_* flags. Reviewers: rafael, grosbach Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12423 llvm-svn: 248618	2015-09-25 21:41:14 +00:00
Matthias Braun	c0bbfcbec3	MachineBasicBlock: Factor out common code into isReturnBlock() llvm-svn: 248617	2015-09-25 21:25:19 +00:00
Sanjoy Das	3c388c3b77	Revert two SCEV changes that caused test failures in clang. r248606: "[SCEV] Exploit A < B => (A+K) < (B+K) when possible" r248608: "[SCEV] Teach isLoopBackedgeGuardedByCond to exploit trip counts." llvm-svn: 248614	2015-09-25 21:16:50 +00:00
Justin Bogner	8ba397b73b	ADCE: Fix typo in file comment. NFC llvm-svn: 248613	2015-09-25 21:03:46 +00:00
Matt Arsenault	8a987c4789	PeepholeOptimizer: Remove redundant copies If a virtual register is copied and another copy was already seen, replace with the previous copy. This only handles the simplest cases for now. This pattern shows up from various operand restrictions AMDGPU has which require inserting copies depending on the register class of the operands. llvm-svn: 248611	2015-09-25 20:22:12 +00:00
Chad Rosier	37bb064ae2	Simplify code. NFC. llvm-svn: 248610	2015-09-25 20:20:22 +00:00
Sanjay Patel	392541a337	more space; NFC llvm-svn: 248609	2015-09-25 20:12:43 +00:00
Sanjoy Das	330be54a7e	[SCEV] Teach isLoopBackedgeGuardedByCond to exploit trip counts. Summary: If the trip count of a specific backedge is `N`, then we know that backedge is effectively guarded by the condition `{0,+,1} u< N`. This change teaches SCEV to use this condition to prove things in `isLoopBackedgeGuardedByCond`. Depends on D12948 Depends on D12949 Reviewers: atrick, reames, majnemer, hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12950 llvm-svn: 248608	2015-09-25 19:59:57 +00:00
Sanjoy Das	a5eb19dad8	[SCEV] Extract helper function from isImpliedCond; NFC Summary: This new helper routine will be used in a subsequent change. Reviewers: hfinkel Subscribers: hfinkel, sanjoy, llvm-commits Differential Revision: http://reviews.llvm.org/D12949 llvm-svn: 248607	2015-09-25 19:59:52 +00:00
Sanjoy Das	6649494805	[SCEV] Exploit A < B => (A+K) < (B+K) when possible Summary: This change teaches SCEV's `isImpliedCond` two new identities: A u< B u< -C => (A + C) u< (B + C) A s< B s< INT_MIN - C => (A + C) s< (B + C) While these are useful on their own, they're really intended to support D12950. Reviewers: atrick, reames, majnemer, nlewycky, hfinkel Subscribers: aadg, sanjoy, llvm-commits Differential Revision: http://reviews.llvm.org/D12948 llvm-svn: 248606	2015-09-25 19:59:49 +00:00
Matt Arsenault	5b7567e45f	AMDGPU: Add some more tests for literal operands llvm-svn: 248600	2015-09-25 18:21:47 +00:00
Matt Arsenault	5a0a835783	AMDGPU: Make getNamedOperandIdx declaration readonly This matches how it is defined in the generated implementation. llvm-svn: 248598	2015-09-25 18:09:15 +00:00
Chad Rosier	bae59cd01a	[AArch64] Add support for generating pre- and post-index load/store pairs. llvm-svn: 248593	2015-09-25 17:48:17 +00:00
Matt Arsenault	24d34e75a5	AMDGPU: Disable some passes that are not meaningful Don't run passes related to stack maps, garbage collection, exceptions since these aren't useful for GPUs. There might be a few more to turn off that I'm less sure about (e.g. ShrinkWrapping) or I'm not sure how to disable (SafeStack and StackProtector) llvm-svn: 248591	2015-09-25 17:41:20 +00:00
Matt Arsenault	a9d7b4e305	AMDGPU: Handle i64->v2i32 loads/stores in PreprocessISelDAG This fixes a select error when the i64 source was also bitcasted to v2i32 in the original source. Instead of awkwardly trying to select the modified source value and the store, replace before isel begins. Uses a worklist to avoid possible problems from mutating the DAG, although it seems to work OK without it. llvm-svn: 248589	2015-09-25 17:27:08 +00:00
Matt Arsenault	43346ab61e	AMDGPU: Fix recomputing dominator tree unnecessarily SIFixSGPRCopies does not modify the CFG, but this was being recomputed before running SIFoldOperands. llvm-svn: 248587	2015-09-25 17:21:28 +00:00
Matt Arsenault	a22e195f0c	AMDGPU: Re-justify workaround and fix worked around problem When buffer resource descriptors were built, the upper two components of the descriptor were first composed into a 64-bit register because legalizeOperands assumed all operands had the same register class. Fix that problem, but keep the workaround. I'm not sure anything actually is actually emitting such a REG_SEQUENCE now. If multiple resource descriptors are set up with different base pointers, this is copied with a single s_mov_b64. We probably should fix this better by recognizing a pair of s_mov_b32 later, but for now delete the dead code. llvm-svn: 248585	2015-09-25 17:08:42 +00:00
Matt Arsenault	fc8f81bb42	AMDGPU: Don't create REG_SEQUENCE with SGPR dest and VGPR sources This avoids needting to re-legalize the new REG_SEQUENCE. llvm-svn: 248584	2015-09-25 17:08:40 +00:00
Matt Arsenault	436e0fa574	AMDGPU: Fix not adding exec to defs of cmpx instruction pseudos This was only set on the final _si/_vi version, but not on the pseudos most of codegen sees. No test since these instructions aren't used yet. llvm-svn: 248583	2015-09-25 16:58:27 +00:00
Matt Arsenault	7377cbeef9	AMDGPU: Improve accuracy of instruction rates for VOPC These were all using the default 32-bit VALU write class, but the i64/f64 compares are half rate. I'm not sure this is really correct, because they are still using the write to VALU write class, even though they really write to the SALU. llvm-svn: 248582	2015-09-25 16:58:25 +00:00
James Molloy	07acfd5604	[GlobalsAA] Teach GlobalsAA about nocapture Arguments to function calls marked "nocapture" can be marked as non-escaping. However, nocapture is defined in terms of the lifetime of the callee, and if the callee can directly or indirectly recurse to the caller, the semantics of nocapture are invalid. Therefore, we eagerly discover which SCC each function belongs to, and later can check if callee and caller of a callsite belong to the same SCC, in which case there could be recursion. This means that we can't be so optimistic in getModRefInfo(ImmutableCallsite) - previously we assumed all call arguments never aliased with an escaping global. Now we need to check, because a global could now be passed as an argument but still not escape. This also solves a related conformance problem: MemCpyOptimizer can turn non-escaping stores of globals into calls to intrinsics like llvm.memcpy/llvm/memset. This confuses GlobalsAA, which knows the global can't escape and so returns NoModRef when queried, when obviously a memcpy/memset call does indeed reference and modify its arguments. This fixes PR24800, PR24801, and PR24802. llvm-svn: 248576	2015-09-25 15:39:29 +00:00
Saleem Abdulrasool	f917c9023e	ARM: make -Asserts,-Werror=unused-variable build happy The value was only used in an assertion. Sink the variable usage into the assertion. llvm-svn: 248562	2015-09-25 05:41:02 +00:00
Saleem Abdulrasool	053ba321cc	ARM: address WoA division limitation We now emit the compiler generated divide by zero check that was needed for the MSVC routines. We construct a psuedo-instruction for the DBZ check as the operation requires splitting up the BB. For the 64-bit operations, we need to custom expand the node as we need to insert the DBZ check and then emit the libcall to the appropriate name. Because this is target specific, it seemed better to reproduce the expansion operation from the target-agnostic type legalization rather than sink this there to avoid the duplication. The division library calls now match MSVC semantically. llvm-svn: 248561	2015-09-25 05:15:46 +00:00
Matt Arsenault	ecdbae22a1	AMDGPU: Remove unused includes llvm-svn: 248553	2015-09-25 00:28:43 +00:00
Sanjoy Das	2ef1693425	[LangRef] Unbreak the docs Sphinx build. r248551 introduced some breakage due to incorrectly terminated ``literals`` s. llvm-svn: 248552	2015-09-25 00:05:40 +00:00
Sanjoy Das	4a1a429535	[Bitcode][Asm] Teach LLVM to read and write operand bundles. Summary: This also adds the first set of tests for operand bundles. The optimizer has not been audited to ensure that it does the right thing with operand bundles. Depends on D12456. Reviewers: reames, chandlerc, majnemer, dexonsmith, kmod, JosephTremoulet, rnk, bogner Subscribers: maksfb, llvm-commits Differential Revision: http://reviews.llvm.org/D12457 llvm-svn: 248551	2015-09-24 23:34:52 +00:00

1 2 3 4 5 ...

121973 Commits