llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-26 22:42:46 +02:00

Author	SHA1	Message	Date
Michael Zuckerman	ae040817a7	[X86] Add support for mmword memory operand size for Intel-syntax x86 assembly Differential Revision: http://reviews.llvm.org/D12151 llvm-svn: 245835	2015-08-24 10:26:54 +00:00
Scott Douglass	2a6e523fef	[ARM] Use AEABI helpers for i64 div and rem Differential Revision: http://reviews.llvm.org/D12232 llvm-svn: 245830	2015-08-24 09:17:18 +00:00
Scott Douglass	abc35dc1e3	[ARM] Refactor LowerDivRem before adding LowerREM (nfc) Differential Revision: http://reviews.llvm.org/D12230 llvm-svn: 245829	2015-08-24 09:17:11 +00:00
Michael Zuckerman	4f0060b27e	first commit to llvm llvm-svn: 245825	2015-08-24 07:48:50 +00:00
Mehdi Amini	3cb178c5fa	Add missing break in AArch64DAGToDAGISel::Select() switch case Reported by coverity. From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 245800	2015-08-23 00:42:57 +00:00
Jingyue Wu	2549889c79	[NVPTX] Allow undef value as global initializer Summary: __shared__ variable may now emit undef value as initializer, do not throw error on that. Test Plan: test/CodeGen/NVPTX/global-addrspace.ll Patch by Xuetian Weng Reviewers: jholewinski, tra, jingyue Subscribers: llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D12242 llvm-svn: 245785	2015-08-22 05:40:26 +00:00
Matt Arsenault	42bf1dc33c	AMDGPU: Allow specifying different opcode on VI for SMRD/SMEM Although the basic s_load_* instructions happen to use the same opcode, some of the special case SMRD instructions have different opcodes. llvm-svn: 245775	2015-08-22 00:54:31 +00:00
Matt Arsenault	3784a7252a	AMDGPU: Improve accuracy of instruction rates for some FP instructions llvm-svn: 245774	2015-08-22 00:50:41 +00:00
Matt Arsenault	2211a78780	AMDGPU: Use DFS to avoid second loop over function llvm-svn: 245772	2015-08-22 00:43:38 +00:00
Matt Arsenault	c8ff6e4f0e	AMDGPU: Make sure to run verifier after SIFixSGPRLiveRanges llvm-svn: 245769	2015-08-22 00:19:34 +00:00
Matt Arsenault	12b207c6f3	AMDGPU: Improve debug printing in SIFixSGPRLiveRanges llvm-svn: 245768	2015-08-22 00:19:25 +00:00
Matt Arsenault	d80c9718c1	AMDGPU: Move CI instructions into CIInstructions.td There are still a couple of CI patterns left in SIInstructions. llvm-svn: 245767	2015-08-22 00:16:34 +00:00
Matt Arsenault	30e4b51f0a	AMDGPU: Minor cleanups to help with f16 support The main change is inverting the condition for the operand class classes so that VT.Size == 16 uses VGPR_32 instead of 64. llvm-svn: 245764	2015-08-21 23:49:51 +00:00
Tom Stellard	c3f6130f41	AMDGPU/SI: Better handle s_wait insertion We can wait on either VM, EXP or LGKM. The waits are independent. Without this patch, a wait inserted because of one of them would also wait for all the previous others. This patch makes s_wait only wait for the ones we need for the next instruction. Here's an example of subtle perf reduction this patch solves: This is without the patch: buffer_load_format_xyzw v[8:11], v0, s[44:47], 0 idxen buffer_load_format_xyzw v[12:15], v0, s[48:51], 0 idxen s_load_dwordx4 s[44:47], s[8:9], 0xc s_waitcnt lgkmcnt(0) buffer_load_format_xyzw v[16:19], v0, s[52:55], 0 idxen s_load_dwordx4 s[48:51], s[8:9], 0x10 s_waitcnt vmcnt(1) buffer_load_format_xyzw v[20:23], v0, s[44:47], 0 idxen The s_waitcnt vmcnt(1) is useless. The reason it is added is because the last buffer_load_format_xyzw needs s[44:47], which was issued by the first s_load_dwordx4. It waits for all VM before that call to have finished. Internally after every instruction, 3 counters (for VM, EXP and LGTM) are updated after every instruction. For example buffer_load_format_xyzw will increase the VM counter, and s_load_dwordx4 the LGKM one. Without the patch, for every defined register, the current 3 counters are stored, and are used to know how long to wait when an instruction needs the register. Because of that, the s[44:47] counter includes that to use the register you need to wait for the previous buffer_load_format_xyzw. Instead this patch stores only the counters that matter for the register, and puts zero for the other ones, since we don't need any wait for them. Patch by: Axel Davy Differential Revision: http://reviews.llvm.org/D11883 llvm-svn: 245755	2015-08-21 22:47:27 +00:00
Vedant Kumar	5213c133c7	[ARM] Fix MachO CPU Subtype selection Differential Revision: http://reviews.llvm.org/D12040 llvm-svn: 245744	2015-08-21 21:52:48 +00:00
Hal Finkel	8f05a818d7	[PowerPC] PPCVSXFMAMutate should not segfault on undef input registers When PPCVSXFMAMutate would look at the input addend register, it would get its input value number. This would fail, however, if the register was undef, causing a segfault. Don't segfault (just skip such FMA instructions). Fixes the test case from PR24542 (although that may have been over-reduced). llvm-svn: 245741	2015-08-21 21:34:24 +00:00
Sanjay Patel	1562e12359	[x86] enable machine combiner reassociations for 256-bit vector min/max llvm-svn: 245735	2015-08-21 21:04:21 +00:00
Sanjay Patel	256ad9fa9f	remove 'FeatureSlowUAMem' from AMD CPUs based on 10H micro-arch or later See discussion in D12154 ( http://reviews.llvm.org/D12154 ), AMD Software Optimization Guides for 10H/12H/15H/16H, and Agner Fog's experimental data. llvm-svn: 245733	2015-08-21 20:39:17 +00:00
Sanjay Patel	f63481a93c	[x86] invert logic for attribute 'FeatureFastUAMem' This is a 'no functional change intended' patch. It removes one FIXME, but adds several more. Motivation: the FeatureFastUAMem attribute may be too general. It is used to determine if any sized misaligned memory access under 32-bytes is 'fast'. From the added FIXME comments, however, you can see that we're not consistent about this. Changing the name of the attribute makes it clearer to see the logic holes. Changing this to a 'slow' attribute also means we don't have to add an explicit 'fast' attribute to new chips; fast unaligned accesses have been standard for several generations of CPUs now. Differential Revision: http://reviews.llvm.org/D12154 llvm-svn: 245729	2015-08-21 20:17:26 +00:00
Sanjay Patel	2e11124ece	[x86] enable machine combiner reassociations for 128-bit vector min/max llvm-svn: 245715	2015-08-21 18:06:49 +00:00
Eric Christopher	11122e529d	Fix typo - symetric -> symmetric. llvm-svn: 245705	2015-08-21 16:23:39 +00:00
James Y Knight	af99ad073e	[Sparc] Support user-specified stack object overalignment. Note: I do not implement a base pointer, so it's still impossible to have dynamic realignment AND dynamic alloca in the same function. This also moves the code for determining the frame index reference into getFrameIndexReference, where it belongs, instead of inline in eliminateFrameIndex. [Begin long-winded screed] Now, stack realignment for Sparc is actually a silly thing to support, because the Sparc ABI has no need for it -- unlike the situation on x86, the stack is ALWAYS aligned to the required alignment for the CPU instructions: 8 bytes on sparcv8, and 16 bytes on sparcv9. However, LLVM unfortunately implements user-specified overalignment using stack realignment support, so for now, I'm going to go along with that tradition. GCC instead treats objects which have alignment specification greater than the maximum CPU-required alignment for the target as a separate block of stack memory, with their own virtual base pointer (which gets aligned). Doing it that way avoids needing to implement per-target support for stack realignment, except for the targets which actually have an ABI-specified stack alignment which is too small for the CPU's requirements. Further unfortunately in LLVM, the default canRealignStack for all targets effectively returns true, despite that implementing that is something a target needs to do specifically. So, the previous behavior on Sparc was to silently ignore the user's specified stack alignment. Ugh. Yet MORE unfortunate, if a target actually does return false from canRealignStack, that also causes the user-specified alignment to be silently ignored, rather than emitting an error. (I started looking into fixing that last, but it broke a bunch of tests, because LLVM actually depends on having it silently ignored: some architectures (e.g. non-linux i386) have smaller stack alignment than spilled-register alignment. But, the fact that a register needs spilling is not known until within the register allocator. And by that point, the decision to not reserve the frame pointer has been frozen in place. And without a frame pointer, stack realignment is not possible. So, canRealignStack() returns false, and needsStackRealignment() then returns false, assuming everyone can just go on their merry way assuming the alignment requirements were probably just suggestions after-all. Sigh...) Differential Revision: http://reviews.llvm.org/D12208 llvm-svn: 245668	2015-08-21 04:17:56 +00:00
NAKAMURA Takumi	8132fe1e68	SparcAsmParser.cpp: Appease msc x86. llvm-svn: 245661	2015-08-21 01:12:19 +00:00
Matthias Braun	c987327a30	AArch64: Fix cmp;ccmp ordering When producing conditional compare sequences for or operations we need to negate the operands and the finally tested flags. The thing is if we negate the finally tested flags this equals a logical negation of all previously emitted expressions. There was a case missing where we have to order OR expressions so they get emitted first. This fixes http://llvm.org/PR24459 llvm-svn: 245641	2015-08-20 23:33:34 +00:00
Matthias Braun	6be5f20671	AArch64: Do not create CCMP on multiple users. Create CMP;CCMP sequences from and/or trees does not gain us anything if the and/or tree is materialized to a GP register anyway. While most of the code already checked for hasOneUse() there was one important case missing. llvm-svn: 245640	2015-08-20 23:33:31 +00:00
Dan Gohman	e759419680	[WebAssembly] Mark more operators as Expand. llvm-svn: 245636	2015-08-20 22:57:13 +00:00
Ahmed Bougacha	f40aa3d262	[X86] Look for scalar through one bitcast when lowering to VBROADCAST. Fixes PR23464: one way to use the broadcast intrinsics is: _mm256_broadcastw_epi16(_mm_cvtsi32_si128((int)src)); We don't currently fold this, but now that we use native IR for the intrinsics (r245605), we can look through one bitcast to find the broadcast scalar. Differential Revision: http://reviews.llvm.org/D10557 llvm-svn: 245613	2015-08-20 21:02:39 +00:00
Jingyue Wu	7623c0ca2c	[NVPTX] truncating 64-bit to 32-bit is free Summary: Add an LSR test that exercises isTruncateFree. Without this change, LSR creates another indvar representing the truncated value. Reviewers: jholewinski, eliben Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D12058 llvm-svn: 245611	2015-08-20 20:59:02 +00:00
Ahmed Bougacha	eb8094c580	[X86] Replace avx2 broadcast intrinsics with native IR. Since r245605, the clang headers don't use these anymore. r245165 updated some of the tests already; update the others, add an autoupgrade, remove the intrinsics, and cleanup the definitions. Differential Revision: http://reviews.llvm.org/D10555 llvm-svn: 245606	2015-08-20 20:36:19 +00:00
James Molloy	8f25429fe1	[ARM] Don't try and custom lower a vNi64 SETCC. It won't go well. We've already marked 64-bit SETCCs as non-Custom, but it's just possible that a SETCC has a legal result type but an illegal operand type. If this happens, bail out before we create unselectable nodes. Fixes PR24292. I tried to create a testcase but in 99% of cases we can't trigger this - not surprising that this bug has been latent since 2009. llvm-svn: 245577	2015-08-20 16:33:44 +00:00
Douglas Katzman	71580abeba	[Sparc]: correct the 'set' synthetic instruction Differential Revision: http://reviews.llvm.org/D12194 llvm-svn: 245575	2015-08-20 16:16:16 +00:00
Marina Yatsina	4f67f6d0b5	[X86] Fix FBLD and FBSTP FBLD and FBSTP should receive TBYTE because it is defined as FBLD m80 FBSTP m80 Differential Revision: http://reviews.llvm.org/D11748 llvm-svn: 245553	2015-08-20 11:51:24 +00:00
Marina Yatsina	668150fc2f	[X86] Fix bug in COMISD and COMISS definition in td files COMISD should receive QWORD because it is defined as (V)COMISD xmm1, xmm2/m64 COMISS should receive DWORD because it is defined as (V)COMISS xmm1, xmm2/m32 Differential Revision: http://reviews.llvm.org/D11712 llvm-svn: 245551	2015-08-20 11:21:36 +00:00
David Majnemer	63faa87008	[X86] Fix the (shl (and (setcc_c), c1), c2) -> (and setcc_c, (c1 << c2)) fold We didn't check for the necessary preconditions before folding a mask/shift into a single mask. This fixes PR24516. llvm-svn: 245544	2015-08-20 09:00:56 +00:00
Hal Finkel	b8e2581942	[PowerPC] Fix value type on XVCMPEQDP for v2f64 comparisons XVCMPEQDP is used for VSX v2f64 equality comparisons, but the value type needs to be v2i64 (as that's the corresponding SETCC type). Fixes PR24225. llvm-svn: 245535	2015-08-20 03:02:02 +00:00
Hal Finkel	c4b6cee81f	[PowerPC] Fix the int2fp(fp2int(x)) DAGCombine to ignore ppc_fp128 This DAGCombine was creating custom SDAG nodes with an illegal ppc_fp128 operand type because it was triggering on f64/f32 int2fp(fp2int(ppc_fp128 x)), but shouldn't (it should only apply to f32/f64 types). The result was a crash. llvm-svn: 245530	2015-08-20 01:18:20 +00:00
Sanjay Patel	46cb7a6fba	[x86] enable machine combiner reassociations for scalar double-precision min/max llvm-svn: 245506	2015-08-19 21:27:27 +00:00
Sanjay Patel	541519d64a	[x86] enable machine combiner reassociations for scalar single-precision maximums llvm-svn: 245504	2015-08-19 21:18:46 +00:00
Juergen Ributzka	c803170766	[AArch64][FastISel] Don't fold shifts with UB. We are already falling back to SelectionDAG when encountering an shift with UB. This adds the same checks for shifts with UB that get folded into arithmetic or logical operations. This fixes rdar://problem/22345295. llvm-svn: 245499	2015-08-19 20:52:55 +00:00
David Majnemer	89452fd5e9	[X86] Emit more efficient >= comparisons against 0 We don't do a great job with >= 0 comparisons against zero when the result is used as an i8. Given something like: void f(long long LL, bool B) { B = LL >= 0; } We used to generate: shrq $63, %rdi xorb $1, %dil movb %dil, (%rsi) Now we generate: testq %rdi, %rdi setns (%rsi) Differential Revision: http://reviews.llvm.org/D12136 llvm-svn: 245498	2015-08-19 20:51:40 +00:00
Dan Gohman	d4dcb44551	[WebAssembly] Use the default alignment for SIMD types. Previously WebAssembly's datalayout string had -v128:8:128. This had been an attempt to declare a certain level of support for unaligned SIMD accesses. However, clang makes its own determinations for SIMD alignment that are independent of the datalayout string, so this wasn't actually meaningful. llvm-svn: 245494	2015-08-19 20:30:20 +00:00
Douglas Katzman	e22dae3b05	[Sparc]: asm-only support for the ldstub instruction. llvm-svn: 245485	2015-08-19 19:30:57 +00:00
Nemanja Ivanovic	9f042b959b	Temporary fix for the self-host failures introduced by rL244921. This revision has introduced an issue that only affects bootstrapped compiler when it is printing the ASM. I am working on resolving the issue, but in the meantime, I'm disabling the legalization of scalar_to_vector operation for v2i64 and the associated testing until I can get this fixed. llvm-svn: 245481	2015-08-19 19:04:47 +00:00
Bruno Cardoso Lopes	aa4e725a3a	[PeepholeOptimizer] Look through PHIs to find additional register sources Reintroduce r245442. Remove an overly conservative assertion introduced in r245442. We could replace the assertion to use `shareSameRegisterFile` instead, but in that point in `insertPHI` we already lost the original Def subreg to check against. So drop the assertion completely. Original commit message: - Teaches the ValueTracker in the PeepholeOptimizer to look through PHI instructions. - Add findNextSourceAndRewritePHI method to lookup into multiple sources returnted by the ValueTracker and rewrite PHIs with new sources. With these changes we can find more register sources and rewrite more copies to allow coaslescing of bitcast instructions. Hence, we eliminate unnecessary VR64 <-> GR64 copies in x86, but it could be extended to other archs by marking "isBitcast" on target specific instructions. The x86 example follows: A: psllq %mm1, %mm0 movd %mm0, %r9 jmp C B: por %mm1, %mm0 movd %mm0, %r9 jmp C C: movd %r9, %mm0 pshufw $238, %mm0, %mm0 Becomes: A: psllq %mm1, %mm0 jmp C B: por %mm1, %mm0 jmp C C: pshufw $238, %mm0, %mm0 Differential Revision: http://reviews.llvm.org/D11197 rdar://problem/20404526 llvm-svn: 245479	2015-08-19 18:53:36 +00:00
Douglas Katzman	fe5dd9c0cd	[SPARC] Enable writing to floating-point-state register. llvm-svn: 245475	2015-08-19 18:34:48 +00:00
Ahmed Bougacha	38ac7b9594	[AArch64] Improve short-form diags on long-form Match_InvalidOperand. Since r244955, we try to use the short-form ErrorInfo when both tries failed, and the long-form match failed on a suffix operand. However, this means we sometimes mix ErrorInfo and MatchResult (one manifestation of this being PR24498). Instead, restore both. llvm-svn: 245469	2015-08-19 17:40:19 +00:00
Renato Golin	9832a30abc	Revert "[AArch64] Simplify/refactor code to ease code review. NFC." This reverts commit r245443, as it broke AArch64 test-suite tramp3d with an assert "Reg && "Null register has no regunits". llvm-svn: 245455	2015-08-19 16:29:53 +00:00
Derek Schuff	5d333b2d27	x32. Fixes a bug in x32 exception handling. This patch updates the X86 lowering so that the Exception Pointer and Selector are 64-bit wide only if Subtarget.isTarget64BitLP64. Patch by João Porto Reviewers: dschuff, rnk Differential Revision: http://reviews.llvm.org/D12111 llvm-svn: 245454	2015-08-19 16:28:21 +00:00
JF Bastien	3ff38c2bd2	x32. Fixes jmp %reg in x32 x32 has 32-bit pointers; x86-64 can't jmp %r32. This patch addresses this issue by explicitly zero-extending brind's target to 64-bits. Author: jpp Reviewers: jfb, dschuff, pavel.v.chupin Subscribers: llvm-commits Differential revision: http://reviews.llvm.org/D12112 llvm-svn: 245452	2015-08-19 16:17:08 +00:00
James Y Knight	ae28c8d3da	[Sparc] Rename LoadASR and StoreASR from r245360 to *ASI, as was intended. llvm-svn: 245450	2015-08-19 15:59:49 +00:00
Bruno Cardoso Lopes	1983993924	Revert "[PeepholeOptimizer] Look through PHIs to find additional register sources" Revert r245442 while investigating a fix. An assertion hit in http://lab.llvm.org:8080/green/job/clang-stage1-configure-RA_build/11380 llvm-svn: 245446	2015-08-19 15:10:32 +00:00
James Y Knight	abfcc0fbba	[SPARC] Fix BooleanContents, so that select of a trunc doesn't eliminate the trunc. Differential Revision: http://reviews.llvm.org/D10442 llvm-svn: 245444	2015-08-19 14:47:04 +00:00
Chad Rosier	01663c5b14	[AArch64] Simplify/refactor code to ease code review. NFC. llvm-svn: 245443	2015-08-19 14:34:54 +00:00
Bruno Cardoso Lopes	49eeaf0b66	[PeepholeOptimizer] Look through PHIs to find additional register sources Reapply r243486. - Teaches the ValueTracker in the PeepholeOptimizer to look through PHI instructions. - Add findNextSourceAndRewritePHI method to lookup into multiple sources returnted by the ValueTracker and rewrite PHIs with new sources. With these changes we can find more register sources and rewrite more copies to allow coaslescing of bitcast instructions. Hence, we eliminate unnecessary VR64 <-> GR64 copies in x86, but it could be extended to other archs by marking "isBitcast" on target specific instructions. The x86 example follows: A: psllq %mm1, %mm0 movd %mm0, %r9 jmp C B: por %mm1, %mm0 movd %mm0, %r9 jmp C C: movd %r9, %mm0 pshufw $238, %mm0, %mm0 Becomes: A: psllq %mm1, %mm0 jmp C B: por %mm1, %mm0 jmp C C: pshufw $238, %mm0, %mm0 Differential Revision: http://reviews.llvm.org/D11197 rdar://problem/20404526 llvm-svn: 245442	2015-08-19 14:34:41 +00:00
Silviu Baranga	55b89ed8ad	[ARM] Add instruction selection patterns for vmin/vmax Summary: The mid-end was generating vector smin/smax/umin/umax nodes, but we were using vbsl to generatate the code. This adds the vmin/vmax patterns and a test to check that we are now generating vmin/vmax instructions. Reviewers: rengolin, jmolloy Subscribers: aemerson, rengolin, llvm-commits Differential Revision: http://reviews.llvm.org/D12105 llvm-svn: 245439	2015-08-19 14:11:27 +00:00
Joerg Sonnenberger	f90461ca22	Map %fprs to %asr6 in the Sparc assembler parser. llvm-svn: 245437	2015-08-19 13:55:14 +00:00
Tobias Grosser	e81517c335	Revert "[X86] Widen the 'AND' mask if doing so shrinks the encoding size" This reverts commit 245169 which miscompiles MultiSource/Applications/siod from LNT. llvm-svn: 245432	2015-08-19 11:35:10 +00:00
Michael Kuperstein	d5d8fe4ef2	[X86] Do not lower scalar sdiv/udiv to a shifts + mul sequence when optimizing for minsize There are some cases where the mul sequence is smaller, but for the most part, using a div is preferable. This does not apply to vectors, since x86 doesn't have vector idiv, and a vector mul/shifts sequence ought to be smaller than a scalarized division. Differential Revision: http://reviews.llvm.org/D12082 llvm-svn: 245431	2015-08-19 11:21:43 +00:00
Michael Kuperstein	fcab5e1388	[TLI] Refactor "is integer division cheap" queries. This removes the isPow2SDivCheap() query, as it is not currently used in any meaningful way. isIntDivCheap() no longer relies on a state variable (as all in-tree target set it to false), but the interface allows querying based on the type optimization level. NFC. Differential Revision: http://reviews.llvm.org/D12082 llvm-svn: 245430	2015-08-19 11:17:59 +00:00
Alex Lorenz	10a69bc7f6	MIR Serialization: Serialize the operand's bit mask target flags. This commit adds support for bit mask target flag serialization to the MIR printer and the MIR parser. It also adds support for the machine operand's target flag serialization to the AArch64 target. Reviewers: Duncan P. N. Exon Smith llvm-svn: 245383	2015-08-18 22:52:15 +00:00
Sanjay Patel	514cb2f8b4	use TLI.allowsMemoryAccess() to check if memory accesses are fast; NFCI This consolidates use of isUnalignedMem32Slow() in one place. There is a slight change in logic although I'm not sure that it would ever come up in the real world: we were assuming that an alignment of the type size is always fast; now, we actually check the data layout to confirm that. llvm-svn: 245382	2015-08-18 22:48:12 +00:00
Joerg Sonnenberger	6697181608	Load/store instructions for floating points with address space require SparcV9. To properly handle this, define the *a instructions as separate instruction classes by refactoring the LoadA and StoreA multiclasses. Move the instruction tests into the sparcv9 file to test the difference. llvm-svn: 245360	2015-08-18 21:31:46 +00:00
David Majnemer	a15246d9db	[WinEH] Calculate state numbers for the new EH representation State numbers are calculated by performing a walk from the innermost funclet to the outermost funclet. Rudimentary support for the new EH constructs has been added to the assembly printer, just enough to test the new machinery. Differential Revision: http://reviews.llvm.org/D12098 llvm-svn: 245331	2015-08-18 19:07:12 +00:00
Matthias Braun	9a77ee2829	MachineRegisterInfo: Introduce isPhysRegUsed() This method checks whether a physical regiser or any of its aliases are used in the function. Using this function in SIRegisterInfo::findUnusedReg() should also fix this reported failure: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20150803/292143.html http://reviews.llvm.org/rL242173#inline-533 The report doesn't come with a testcase and I don't know enough about AMDGPU to create one myself. llvm-svn: 245329	2015-08-18 18:54:27 +00:00
Sanjay Patel	1163b7b428	use minSize wrapper; NFCI These were missed when other uses were switched over: http://llvm.org/viewvc/llvm-project?view=revision&revision=243994 llvm-svn: 245311	2015-08-18 16:44:23 +00:00
Chad Rosier	c8874896fb	[AArch64] Simplify the logic for computing in bounds offset. NFC. llvm-svn: 245307	2015-08-18 16:20:03 +00:00
Daniel Sanders	21e741e772	[mips] Expand JAL instructions when PIC is enabled. Summary: This is the correct way to handle JAL instructions when PIC is enabled. Patch by Toma Tabacu Reviewers: seanbruno, tomatabacu Subscribers: brooks, seanbruno, emaste, llvm-commits Differential Revision: http://reviews.llvm.org/D6231 llvm-svn: 245305	2015-08-18 16:18:09 +00:00
Zoran Jovanovic	dec5269b37	[mips][microMIPS] Implement DDIV, DMOD, DDIVU and DMODU instructions Differential Revision: http://reviews.llvm.org/D10953 llvm-svn: 245297	2015-08-18 14:40:43 +00:00
Zoran Jovanovic	df033f686f	[mips][microMIPS] Implement SW and SWE instructions Differential Revision: http://reviews.llvm.org/D10869 llvm-svn: 245293	2015-08-18 12:53:08 +00:00
Daniel Sanders	056f7ddc7b	[mips] Make the MipsAsmParser capable of knowing whether PIC mode is enabled or not. Summary: This information is needed to decide whether we do the PIC-only JAL expansions or not. It's also needed for an upcoming patch which implements the .cprestore assembler directive (which can only be used effectively in PIC mode). By making this information available to the MipsAsmParser, we will know when to insert the instructions mandated by the .cprestore assembler directive and we will be able to give some useful warnings when we encounter a potential misuse of this directive. Patch by Toma Tabacu Reviewers: dsanders, seanbruno Subscribers: brooks, seanbruno, rafael, llvm-commits Differential Revision: http://reviews.llvm.org/D5626 llvm-svn: 245291	2015-08-18 12:33:54 +00:00
Daniel Sanders	a88078f2d1	[mips] Correct -Woverflow warning in r245208 without changing signedness of the constant. This was supposed to have been committed as part of r245208 llvm-svn: 245285	2015-08-18 09:55:57 +00:00
Guozhi Wei	82b60bf3e7	Align SP adjustment in function getSPAdjust This commit adds a new function TargetFrameLowering::alignSPAdjust and calls it from TargetInstrInfo::getSPAdjust. It fixes PR24142. llvm-svn: 245253	2015-08-17 22:36:27 +00:00
Douglas Katzman	5962bf65c5	[SPARC]: recognize '.' as the start of an assembler expression. llvm-svn: 245232	2015-08-17 19:55:01 +00:00
James Molloy	9672bb0968	[ARM] Fix crash when targetting CPU without NEON We emulate a scalar vmin/vmax with NEON instructions as they don't exist in the VFP ISA. So only mark these as legal when NEON is available. Found here: https://code.google.com/p/chromium/issues/detail?id=521671 llvm-svn: 245231	2015-08-17 19:37:12 +00:00
Silviu Baranga	23dd331f9d	[CostModel][AArch64] Increase cost of vector insert element and add missing cast costs Summary: Increase the estimated costs for insert/extract element operations on AArch64. This is motivated by results from benchmarking interleaved accesses. Add missing costs for zext/sext/trunc instructions and some integer to floating point conversions. These costs were previously calculated by scalarizing these operation and were affected by the cost increase of the insert/extract element operations. Reviewers: rengolin Subscribers: mcrosier, aemerson, rengolin, llvm-commits Differential Revision: http://reviews.llvm.org/D11939 llvm-svn: 245226	2015-08-17 16:05:09 +00:00
Silviu Baranga	8db095e89c	[CostModel][ARM] Increase cost of insert/extract operations Summary: This change limits the minimum cost of an insert/extract element operation to 2 in cases where this would result in mixing of NEON and VFP code. Reviewers: rengolin Subscribers: mssimpso, aemerson, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D12030 llvm-svn: 245225	2015-08-17 15:57:05 +00:00
Aaron Ballman	0adf961ba2	Correcting a -Woverflow warning where 0xFFFF was overflowing an implicit constant conversion. llvm-svn: 245220	2015-08-17 14:25:57 +00:00
Daniel Sanders	b2dfd80630	[mips] [IAS] Add support for the DLA pseudo-instruction and fix problems with DLI Summary: It is the same as LA, except that it can also load 64-bit addresses and it only works on 64-bit MIPS architectures. Reviewers: tomatabacu, seanbruno, vkalintiris Subscribers: brooks, seanbruno, emaste, llvm-commits Differential Revision: http://reviews.llvm.org/D9524 llvm-svn: 245208	2015-08-17 10:11:55 +00:00
James Molloy	7493c91000	Remove hand-rolled matching for fmin and fmax. SDAGBuilder now does this all for us. llvm-svn: 245198	2015-08-17 07:13:20 +00:00
James Molloy	2eec79630b	Rip out hand-rolled matching code for VMIN, VMAX, VMINNM and VMAXNM This is no longer needed - SDAGBuilder will do this for us. llvm-svn: 245197	2015-08-17 07:13:15 +00:00
Chandler Carruth	4d1e1851a4	[PM] Port ScalarEvolution to the new pass manager. This change makes ScalarEvolution a stand-alone object and just produces one from a pass as needed. Making this work well requires making the object movable, using references instead of overwritten pointers in a number of places, and other refactorings. I've also wired it up to the new pass manager and added a RUN line to a test to exercise it under the new pass manager. This includes basic printing support much like with other analyses. But there is a big and somewhat scary change here. Prior to this patch ScalarEvolution was never actually invalidated!!! Re-running the pass just re-wired up the various other analyses and didn't remove any of the existing entries in the SCEV caches or clear out anything at all. This might seem OK as everything in SCEV that can uses ValueHandles to track updates to the values that serve as SCEV keys. However, this still means that as we ran SCEV over each function in the module, we kept accumulating more and more SCEVs into the cache. At the end, we would have a SCEV cache with every value that we ever needed a SCEV for in the entire module!!! Yowzers. The releaseMemory routine would dump all of this, but that isn't realy called during normal runs of the pipeline as far as I can see. To make matters worse, there is actually a key that we don't update with value handles -- there is a map keyed off of Loops. Because LoopInfo does* release its memory from run to run, it is entirely possible to run SCEV over one function, then over another function, and then lookup a Loop* from the second function but find an entry inserted for the first function! Ouch. To make matters still worse, there are plenty of updates that don't trip a value handle. It seems incredibly unlikely that today GVN or another pass that invalidates SCEV can update values in just such a way that a subsequent run of SCEV will incorrectly find lookups in a cache, but it is theoretically possible and would be a nightmare to debug. With this refactoring, I've fixed all this by actually destroying and recreating the ScalarEvolution object from run to run. Technically, this could increase the amount of malloc traffic we see, but then again it is also technically correct. ;] I don't actually think we're suffering from tons of malloc traffic from SCEV because if we were, the fact that we never clear the memory would seem more likely to have come up as an actual problem before now. So, I've made the simple fix here. If in fact there are serious issues with too much allocation and deallocation, I can work on a clever fix that preserves the allocations (while clearing the data) between each run, but I'd prefer to do that kind of optimization with a test case / benchmark that shows why we need such cleverness (and that can test that we actually make it faster). It's possible that this will make some things faster by making the SCEV caches have higher locality (due to being significantly smaller) so until there is a clear benchmark, I think the simple change is best. Differential Revision: http://reviews.llvm.org/D12063 llvm-svn: 245193	2015-08-17 02:08:17 +00:00
Yaron Keren	7da50fadd7	Add missing include guard. llvm-svn: 245173	2015-08-16 07:55:08 +00:00
David Majnemer	54473774f9	[X86] Widen the 'AND' mask if doing so shrinks the encoding size We can set additional bits in a mask given that we know the other operand of an AND already has some bits set to zero. This can be more efficient if doing so allows us to use an instruction which implicitly sign extends the immediate. This fixes PR24085. Differential Revision: http://reviews.llvm.org/D11289 llvm-svn: 245169	2015-08-16 04:52:11 +00:00
Sanjay Patel	17773de9af	[x86] enable machine combiner reassociations for scalar single-precision minimums llvm-svn: 245166	2015-08-15 17:01:54 +00:00
Yaron Keren	1f1dc3836f	Silence VS2015 warning. Patch by James Touton! http://reviews.llvm.org/D11890 llvm-svn: 245161	2015-08-15 14:54:43 +00:00
Simon Pilgrim	6d6e55b42f	[DAGCombiner] Attempt to mask vectors before zero extension instead of after. For cases where we TRUNCATE and then ZERO_EXTEND to a larger size (often from vector legalization), see if we can mask the source data and then ZERO_EXTEND (instead of after a ANY_EXTEND). This can help avoid having to generate a larger mask, and possibly applying it to several sub-vectors. (zext (truncate x)) -> (zext (and(x, m)) Includes a minor patch to SystemZ to better recognise 8/16-bit zero extension patterns from RISBG bit-extraction code. This is the first of a number of minor patches to help improve the conversion of byte masks to clear mask shuffles. Differential Revision: http://reviews.llvm.org/D11764 llvm-svn: 245160	2015-08-15 13:27:30 +00:00
Matt Arsenault	3ad7a3466f	AMDGPU/SI: Only look at live out SGPR defs When trying to fix SGPR live ranges, skip defs that are killed in the same block as the def. I don't think we need to worry about these cases as long as the live ranges of the SGPRs in dominating blocks are correct. This reduces the number of elements the second loop over the function needs to look at, and makes it generally easier to understand. The second loop also only considers if the live range is live in to a block, which logically means it must have been live out from another. llvm-svn: 245150	2015-08-15 02:58:49 +00:00
James Y Knight	4239407b0a	Remove redundant TargetFrameLowering::getFrameIndexOffset virtual function. This was the same as getFrameIndexReference, but without the FrameReg output. Differential Revision: http://reviews.llvm.org/D12042 llvm-svn: 245148	2015-08-15 02:32:35 +00:00
JF Bastien	8905d63bd8	[WebAssembly] Add Relooper This is just an initial checkin of an implementation of the Relooper algorithm, in preparation for WebAssembly codegen to utilize. It doesn't do anything yet by itself. The Relooper algorithm takes an arbitrary control flow graph and generates structured control flow from that, utilizing a helper variable when necessary to handle irreducibility. The WebAssembly backend will be able to use this in order to generate an AST for its binary format. Author: azakai Reviewers: jfb, sunfish Subscribers: jevinskie, arsenm, jroelofs, llvm-commits Differential revision: http://reviews.llvm.org/D11691 llvm-svn: 245142	2015-08-15 01:23:28 +00:00
Matt Arsenault	a02881be87	AMDGPU/SI: Fix printing useless info with amdhsa The comments at the bottom would all report 0 if amdhsa was used. llvm-svn: 245135	2015-08-15 00:12:39 +00:00
Matt Arsenault	084da36b61	AMDGPU/SI: Update LiveVariables This is simple but won't work if/when this pass is moved to be post-SSA. llvm-svn: 245134	2015-08-15 00:12:37 +00:00
Matt Arsenault	8ba82b0f13	AMDGPU/SI: Update LiveIntervals during SIFixSGPRLiveRanges Does not mark SlotIndexes as reserved, although I think that might be OK. LiveVariables still need to be handled. llvm-svn: 245133	2015-08-15 00:12:35 +00:00
Matt Arsenault	eb26935212	AMDGPU: Remove unnecessary assert These shouldn't ever be null. The number of successors was already asserted to be 2. llvm-svn: 245132	2015-08-15 00:12:32 +00:00
Matt Arsenault	f4f804d244	AMDGPU/SI: Make comments more precise. True branch instructions do behave as expected with liveness. Avoid the phrasing "branch decision is based on a value in an SGPR" because this could be misleading. A VALU compare instruction's result is still based on an SGPR, even though that condition may be divergent. llvm-svn: 245131	2015-08-15 00:12:30 +00:00
Pat Gavlin	3420b345f9	Add a target environment for CoreCLR. Although targeting CoreCLR is similar to targeting MSVC, there are certain important differences that the backend must be aware of (e.g. differences in stack probes, EH, and library calls). Differential Revision: http://reviews.llvm.org/D11012 llvm-svn: 245115	2015-08-14 22:41:43 +00:00
Ahmed Bougacha	693e93889e	[AArch64] Fix FMLS scalar-indexed-from-2s-after-neg patterns. We canonicalize V64 vectors to V128 through insert_subvector: the other FMLA/FMLS/FMUL/FMULX patterns match that already, but this one doesn't, so we'd fail to match fmls and generate fneg+fmla instead. The vector equivalents are already tested and functional. llvm-svn: 245107	2015-08-14 22:06:05 +00:00
Tom Stellard	82b166fe2e	AMDGPU/SI: Add missing spill class The compiler was failing to spill for some shaders. Patch By: Axel Davy llvm-svn: 245087	2015-08-14 19:46:05 +00:00
Renato Golin	02ad3dba6e	Revert "[ARM] Fix MachO CPU Subtype selection" This reverts commit r245081, as it breaks many builds. llvm-svn: 245086	2015-08-14 19:35:47 +00:00
Vedant Kumar	6a789375d8	[ARM] Fix MachO CPU Subtype selection This patch makes the Darwin ARM backend take advantage of TargetParser. It also teaches TargetParser about ARMV7K for the first time. This makes target triple parsing more consistent across llvm. Differential Revision: http://reviews.llvm.org/D11996 llvm-svn: 245081	2015-08-14 18:36:47 +00:00
Sanjay Patel	49dad7926f	[x86] fix allowsMisalignedMemoryAccess() implementation This patch fixes the x86 implementation of allowsMisalignedMemoryAccess() to correctly return the 'Fast' output parameter for 32-byte accesses. To test that, an existing load merging optimization is changed to use the TLI hook. This exposes a shortcoming in the current logic and results in the regression test update. Changing other direct users of the isUnalignedMem32Slow() x86 CPU attribute would be a follow-on patch. Without the fix in allowsMisalignedMemoryAccesses(), we will infinite loop when targeting SandyBridge because LowerINSERT_SUBVECTOR() creates 32-byte loads from two 16-byte loads while PerformLOADCombine() splits them back into 16-byte loads. Differential Revision: http://reviews.llvm.org/D10662 llvm-svn: 245075	2015-08-14 17:53:40 +00:00
Rafael Espindola	e13fd954fe	Revert "Centralize the information about which object format we are using." This reverts commit r245047. It was failing on the darwin bots. The problem was that when running ./bin/llc -march=msp430 llc gets to if (TheTriple.getTriple().empty()) TheTriple.setTriple(sys::getDefaultTargetTriple()); Which means that we go with an arch of msp430 but a triple of x86_64-apple-darwin14.4.0 which fails badly. That code has to be updated to select a triple based on the value of march, but that is not a trivial fix. llvm-svn: 245062	2015-08-14 15:48:41 +00:00
Sanjay Patel	27418f3d97	don't repeaat function names in comments; NFC llvm-svn: 245058	2015-08-14 15:11:42 +00:00
Rafael Espindola	13e78b34ae	Centralize the information about which object format we are using. Other than some places that were handling unknown as ELF, this should have no change. The test updates are because we were detecting arm-coff or x86_64-win64-coff as ELF targets before. It is not clear if the enum should live on the Triple. At least now it lives in a single location and should be easier to move somewhere else. llvm-svn: 245047	2015-08-14 13:31:17 +00:00
James Molloy	efb759c2e3	[AArch64] FMINNAN/FMAXNAN on f16 is not legal. Spotted by Ahmed - in r244594 I inadvertently marked f16 min/max as legal. I've reverted it here, and marked min/max on scalar f16's as promote. I've also added a testcase. The test just checks that the compiler doesn't fall over - it doesn't create fmin nodes for f16 yet. llvm-svn: 245035	2015-08-14 09:08:50 +00:00
David Majnemer	10f2d9234b	[IR] Add token types This introduces the basic functionality to support "token types". The motivation stems from the need to perform operations on a Value whose provenance cannot be obscured. There are several applications for such a type but my immediate motivation stems from WinEH. Our personality routine enforces a single-entry - single-exit regime for cleanups. After several rounds of optimizations, we may be left with a terminator whose "cleanup-entry block" is not entirely clear because control flow has merged two cleanups together. We have experimented with using labels as operands inside of instructions which are not terminators to indicate where we came from but found that LLVM does not expect such exotic uses of BasicBlocks. Instead, we can use this new type to clearly associate the "entry point" and "exit point" of our cleanup. This is done by having the cleanuppad yield a Token and consuming it at the cleanupret. The token type makes it impossible to obscure or otherwise hide the Value, making it trivial to track the relationship between the two points. What is the burden to the optimizer? Well, it turns out we have already paid down this cost by accepting that there are certain calls that we are not permitted to duplicate, optimizations have to watch out for such instructions anyway. There are additional places in the optimizer that we will probably have to update but early examination has given me the impression that this will not be heroic. Differential Revision: http://reviews.llvm.org/D11861 llvm-svn: 245029	2015-08-14 05:09:07 +00:00
Saleem Abdulrasool	4f55a75e27	PowerPC: remove dead initialization (NFC) Identified by the clang static analyzer. No functional change intended. llvm-svn: 245022	2015-08-14 03:48:35 +00:00
Simon Pilgrim	3d6f76a00a	[AMDGPU] Use the general SMAX/SMIN/UMAX/UMIN pattern matching and remove the AMDGPU implementation D9746 added general SMAX/SMIN/UMAX/UMIN pattern matching to SelectionDAGBuilder::visitSelect. Differential Revision: http://reviews.llvm.org/D12007 llvm-svn: 244960	2015-08-13 21:40:02 +00:00
Ahmed Bougacha	4a160c8a9b	[AArch64] Provide "too few operands" diags on short-form NEON also. We used to just say "invalid type suffix for instruction", which is misleading. This is because we fallback to the long-form matcher if the short-form matcher failed, losing the error information on the way. Save it, so that we can provide a little better diagnostics when the long-form matcher thinks a suffix is the cause of the error. llvm-svn: 244955	2015-08-13 21:09:13 +00:00
Simon Pilgrim	29ec218d1b	[X86][SSE] Use the general SMAX/SMIN/UMAX/UMIN pattern matching and remove the X86 implementation Follow up to D10947 - D9746 added general SMAX/SMIN/UMAX/UMIN pattern matching to SelectionDAGBuilder::visitSelect. This patch removes the X86 implementation and improves the AVX1/AVX2 support to correctly lower 256-bit integer vectors. Differential Revision: http://reviews.llvm.org/D12006 llvm-svn: 244949	2015-08-13 20:45:55 +00:00
Yaron Keren	9267630cbc	Remove and forbid raw_svector_ostream::flush() calls. After r244870 flush() will only compare two null pointers and return, doing nothing but wasting run time. The call is not required any more as the stream and its SmallString are always in sync. Thanks to David Blaikie for reviewing. llvm-svn: 244928	2015-08-13 18:12:56 +00:00
Nemanja Ivanovic	285f278c18	Scalar to vector conversions using direct moves This patch corresponds to review: http://reviews.llvm.org/D11471 It improves the code generated for converting a scalar to a vector value. With direct moves from GPRs to VSRs, we no longer require expensive stack operations for this. Subsequent patches will handle the reverse case and more general operations between vectors and their scalar elements. llvm-svn: 244921	2015-08-13 17:40:44 +00:00
James Molloy	25813b5a03	[ARM] FMINNAN/FMAXNAN of f64 are not legal. This was my error. We've got f32 marked as legal because they're simulated using a v2f32 instruction, but there's no equivalent for f64. This will get test coverage imminently when D12015 lands. llvm-svn: 244916	2015-08-13 17:28:26 +00:00
James Molloy	aa51550e7c	[ARM] Allow vmin/vmax of scalars to be emitted without UseNEONForFP. This overrides the default to more closely resemble the hand-crafted matching logic in ISelLowering. It makes sense, as there is no VFP equivalent of vmin or vmax, to use them when they're available even if in general VFP ops should be preferred. This should be NFC. llvm-svn: 244915	2015-08-13 17:28:20 +00:00
Ulrich Weigand	6643dc8666	[SystemZ] Support large LLVM IR struct return values Recent mesa/llvmpipe crashes on SystemZ due to a failed assertion when attempting to compile a routine with a return type of { <4 x float>, <4 x float>, <4 x float>, <4 x float> } on a system without vector instruction support. This is because after legalizing the vector type, we get a return value consisting of 16 floats, which cannot all be returned in registers. Usually, what should happen in this case is that the target's CanLowerReturn routine rejects the return type, in which case SelectionDAG falls back to implementing a structure return in memory via implicit reference. However, the SystemZ target never actually implemented any CanLowerReturn routine, and thus would accept any struct return type. This patch fixes the crash by implementing CanLowerReturn. As a side effect, this also handles fp128 return values, fixing a todo that was noted in SystemZCallingConv.td. llvm-svn: 244889	2015-08-13 13:37:06 +00:00
John Brawn	ecc0ff6b14	[ARM] Reorganise and simplify thumb-1 load/store selection Other than PC-relative loads/store the patterns that match the various load/store addressing modes have the same complexity, so the order that they are matched is the order that they appear in the .td file. Rearrange the instruction definitions in ARMInstrThumb.td, and make use of AddedComplexity for PC-relative loads, so that the instruction matching order is the order that results in the simplest selection logic. This also makes register-offset load/store be selected when it should, as previously it was only selected for too-large immediate offsets. Differential Revision: http://reviews.llvm.org/D11800 llvm-svn: 244882	2015-08-13 10:48:22 +00:00
Ahmed Bougacha	8e769a9269	[AArch64] Also custom-lowering mismatched vector/f16 FCOPYSIGN. We can lower them using our cool tricks if we fpext/fptrunc the second input, like we do for f32/f64. Follow-up to r243924, r243926, and r244858. llvm-svn: 244860	2015-08-13 01:13:56 +00:00
JF Bastien	9205d59c82	WebAssembly: floating-point comparisons Summary: D11924 implemented part of the floating-point comparisons, this patch implements the rest: * Tell ISelLowering that all booleans are either 0 or 1. * Expand the eq/ne/lt/le/gt/ge floating-point comparisons to the canonical ones (similar to what Mips32r6InstrInfo.td does). * Add tests for ord/uno. * Add tests for ueq/one/ult/ule/ugt/uge. * Fix existing comparison tests to remove the (res & 1) code, which setBooleanContents stops from generating. Reviewers: sunfish Subscribers: llvm-commits, jfb Differential Revision: http://reviews.llvm.org/D11970 llvm-svn: 244779	2015-08-12 17:53:29 +00:00
Sanjay Patel	a24f811c83	80-cols; NFC llvm-svn: 244755	2015-08-12 15:12:25 +00:00
Sanjay Patel	463099751f	fix typo; NFC llvm-svn: 244753	2015-08-12 15:09:09 +00:00
Zoran Jovanovic	3c2a065d19	[mips][microMIPS] Create microMIPS64r6 subtarget and implement DALIGN, DAUI, DAHI, DATI, DEXT, DEXTM and DEXTU instructions Differential Revision: http://reviews.llvm.org/D10923 llvm-svn: 244744	2015-08-12 12:45:16 +00:00
Michael Kuperstein	43bbce4282	[X86] Disable mul -> shl + lea combine when compiling for minsize Differential Revision: http://reviews.llvm.org/D11904 llvm-svn: 244740	2015-08-12 11:27:26 +00:00
Michael Kuperstein	8c8a758faa	[X86] Allow x86 call frame optimization to fold more loads into pushes This abstracts away the test for "when can we fold across a MachineInstruction" into the the MI interface, and changes call-frame optimization use the same test the peephole optimizer users. Differential Revision: http://reviews.llvm.org/D11945 llvm-svn: 244729	2015-08-12 10:14:58 +00:00
Matt Arsenault	ac50a3d981	AMDGPU: Fix assert on dbg_value instructions llvm-svn: 244728	2015-08-12 09:04:44 +00:00
Simon Pilgrim	45d6ddee89	[InstCombine] Move SSE/AVX vector blend folding to instcombiner As discussed in D11886, this patch moves the SSE/AVX vector blend folding to instcombiner from PerformINTRINSIC_WO_CHAINCombine (which allows us to remove this completely). InstCombiner already had partial support for this, I just had to add support for zero (ConstantAggregateZero) masks and also the case where both selection inputs were the same (allowing us to ignore the mask). I also moved all the relevant combine tests into InstCombine/blend_x86.ll Differential Revision: http://reviews.llvm.org/D11934 llvm-svn: 244723	2015-08-12 08:08:56 +00:00
Saleem Abdulrasool	23546702ae	X86: hoist a condition into a variable (NFC) The same value is used multiple times through the function. Hoist the condition into a variable. This should fix a silly static analysis warning where the conditions flip around. No functional change intended. llvm-svn: 244713	2015-08-12 02:01:36 +00:00
Sanjay Patel	7b4cd645e8	[x86] enable machine combiner reassociations for 256-bit vector FP mul/add llvm-svn: 244705	2015-08-12 00:29:10 +00:00
Alex Lorenz	ce2812bb8e	PseudoSourceValue: Transform the mips subclass to target independent subclasses This commit transforms the mips-specific 'MipsCallEntry' subclass of the 'PseudoSourceValue' class into two, target-independent subclasses named 'GlobalValuePseudoSourceValue' and 'ExternalSymbolPseudoSourceValue'. This change makes it easier to serialize the pseudo source values by removing target-specific pseudo source values. Reviewers: Akira Hatanaka llvm-svn: 244698	2015-08-11 23:23:17 +00:00
Alex Lorenz	7b1d22a17d	PseudoSourceValue: Replace global manager with a manager in a machine function. This commit removes the global manager variable which is responsible for storing and allocating pseudo source values and instead it introduces a new manager class named 'PseudoSourceValueManager'. Machine functions now own an instance of the pseudo source value manager class. This commit also modifies the 'get...' methods in the 'MachinePointerInfo' class to construct pseudo source values using the instance of the pseudo source value manager object from the machine function. This commit updates calls to the 'get...' methods from the 'MachinePointerInfo' class in a lot of different files because those calls now need to pass in a reference to a machine function to those methods. This change will make it easier to serialize pseudo source values as it will enable me to transform the mips specific MipsCallEntry PseudoSourceValue subclass into two target independent subclasses. Reviewers: Akira Hatanaka llvm-svn: 244693	2015-08-11 23:09:45 +00:00
Alex Lorenz	4047ccf510	PseudoSourceValue: Introduce a 'PSVKind' enumerator. This commit introduces a new enumerator named 'PSVKind' in the 'PseudoSourceValue' class. This enumerator is now used to distinguish between the various kinds of pseudo source values. This change is done in preparation for the changes to the pseudo source value object management and to the PseudoSourceValue's class hierarchy - the next two PseudoSourceValue commits will get rid of the global variable that manages the pseudo source values and the mips specific MipsCallEntry subclass. Reviewers: Akira Hatanaka llvm-svn: 244687	2015-08-11 22:32:00 +00:00
Mark Heffernan	030525f5bf	Use 32-bit divides instead of 64-bit divides where possible. For NVPTX, try to use 32-bit division instead of 64-bit division when the dividend and divisor fit in 32 bits. This speeds up some internal benchmarks significantly. The underlying reason is that many index computations are carried out in 64-bits but never actually exceed the capacity of a 32-bit word. llvm-svn: 244684	2015-08-11 22:16:34 +00:00
JF Bastien	a0847105a1	WebAssembly: implement comparison. Some of the FP comparisons (ueq, one, ult, ule, ugt, uge) are currently broken, I'll fix them in a follow-up. Reviewers: sunfish Subscribers: llvm-commits, jfb Differential Revision: http://reviews.llvm.org/D11924 llvm-svn: 244665	2015-08-11 21:02:46 +00:00
Sanjay Patel	39bab9e7a2	[x86] enable machine combiner reassociations for 128-bit vector single/double multiplies llvm-svn: 244657	2015-08-11 20:19:23 +00:00
JF Bastien	aace81abf2	WebAssembly: implement WebAssemblyTargetLowering::getTargetNodeName Summary: Implementation is the same as in AArch64. Subscribers: aemerson, jfb, llvm-commits, sunfish Differential Revision: http://reviews.llvm.org/D11956 llvm-svn: 244655	2015-08-11 20:13:18 +00:00
Rafael Espindola	574b6734d9	Use llvm::make_unique to fix the MSVC build. llvm-svn: 244641	2015-08-11 18:11:17 +00:00
Michael Kuperstein	8ea9afb887	[X86] Allow merging of immediates within a basic block for code size savings First step in preventing immediates that occur more than once within a single basic block from being pulled into their users, in order to prevent unnecessary large instruction encoding .Currently enabled only when optimizing for size. Patch by: zia.ansari@intel.com Differential Revision: http://reviews.llvm.org/D11363 llvm-svn: 244601	2015-08-11 14:10:58 +00:00
James Molloy	e0929cde28	[AArch64] Match fminnum/fmaxnum for vector fminnm/fmaxnm instead of an intrinsic. Lower Intrinsic::aarch64_neon_fmin/fmax to fminnum/fmannum and match that instead. Minimal functional change: - Extra tests added because coverage of scalar fminnm/fmaxnm instructions was nonexistant. - f16 test updated because now we actually generate scalar fminnm/fmaxnm we no longer need to bail out to a libcall! llvm-svn: 244595	2015-08-11 12:06:37 +00:00
James Molloy	655902e549	[AArch64] Replace the custom AArch64ISD::FMIN/MAX nodes with ISD::FMINNAN/MAXNAN NFCI. This just removes custom ISDNodes that are no longer needed. llvm-svn: 244594	2015-08-11 12:06:33 +00:00
James Molloy	d56d688228	[ARM] Match fminnan/fmaxnan for vector vmin/vmax instead of an intrinsic Lower Intrinsic::arm_neon_vmins/vmaxs to fminnan/fmaxnan and match that instead. This is important because SDAG will soon be able to select FMINNAN itself, so we need a unified lowering path for intrinsics and SDAG. NFCI. llvm-svn: 244593	2015-08-11 12:06:28 +00:00
James Molloy	c131e948e8	[ARM] Match fminnum/fmaxnum for vector vminnm/vmaxnm instead of an intrinsic Lower the intrinsic to a FMINNUM/FMAXNUM node and select that instead. This is important because soon SDAG will be able to select FMINNUM/FMAXNUM itself, so we need an integrated lowering path between SDAG and intrinsics. NFCI. llvm-svn: 244592	2015-08-11 12:06:25 +00:00
James Molloy	83cfd780e5	[ARM] Replace ARMISD::VMINNM/VMAXNM with ISD::FMINNUM/FMAXNUM NFCI. This replaces another custom ISDNode with a generic equivalent. llvm-svn: 244591	2015-08-11 12:06:22 +00:00
James Molloy	9564f7ade6	[ARM] Replace ARMISD::FMIN/FMAX with the shiny new ISD::FMINNAN/FMAXNAN. NFCI. This removes a custom ISDNode. llvm-svn: 244590	2015-08-11 12:06:15 +00:00
Marina Yatsina	a28fbe6a96	[X86] Add SAL mnemonics for Intel syntax SAL and SHL instructions perform the same operation Differential Revision: http://reviews.llvm.org/D11882 llvm-svn: 244588	2015-08-11 12:05:06 +00:00
Marina Yatsina	fc986c89c0	[X86] Fix REPE, REPZ, REPNZ for intel syntax REPE, REPZ, REPNZ, REPNE should have mnemonics for Intel syntax as well. Currently using these instructions causes compilation errors for Intel syntax. Differential Revision: http://reviews.llvm.org/D11794 llvm-svn: 244584	2015-08-11 11:28:10 +00:00
Marina Yatsina	d8e14460d5	[X86] Fix imul alias for intel syntax The "imul reg, imm" alias is not defined for intel syntax. In intel syntax there is no w/l/q suffix for the imul instruction. Differential Revision: http://reviews.llvm.org/D11887 llvm-svn: 244582	2015-08-11 10:43:04 +00:00
Vasileios Kalintiris	761ce121c9	[mips] Remap move as or. Summary: This patch remaps the assembly idiom 'move' to 'or' instead of 'daddu' or 'addu'. The use of addu/daddu instead of or as move was highlighted as a performance issue during the analysis of a recent 64bit design. Originally move was encoded as 'or' by binutils but was changed for the r10k cpu family due to their pipeline which had 2 arithmetic units and a single logical unit, and so could issue multiple (d)addu based moves at the same time but only 1 logical move. This patch preserves the disassembly behaviour so that disassembling a old style (d)addu move still appears as move, but assembling move always gives an or Patch by Simon Dardis. Reviewers: vkalintiris Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D11796 llvm-svn: 244579	2015-08-11 08:56:25 +00:00
Michael Kuperstein	ebd10e5c0e	[X86] When optimizing for minsize, use POP for small post-call stack clean-up When optimizing for size, replace "addl $4, %esp" and "addl $8, %esp" following a call by one or two pops, respectively. We don't try to do it in general, but only when the stack adjustment immediately follows a call - which is the most common case. That allows taking a short-cut when trying to find a free register to pop into, instead of a full-blown liveness check. If the adjustment immediately follows a call, then every register the call clobbers but doesn't define should be dead at that point, and can be used. Differential Revision: http://reviews.llvm.org/D11749 llvm-svn: 244578	2015-08-11 08:48:48 +00:00
JF Bastien	7a8b1de402	WebAssembly: NFC fix release build break, unused variable. Summary: Caused by D11914, pointed out by blaikie. Subscribers: llvm-commits, jfb, dblaikie Differential Revision: http://reviews.llvm.org/D11929 llvm-svn: 244570	2015-08-11 04:52:24 +00:00
JF Bastien	b4d2511cd9	WebAssembly: add basic floating-point tests Summary: I somehow forgot to add these when I added the basic floating-point opcodes. Also remove ceil/floor/trunc/nearestint for now, and add them only when properly tested. Subscribers: llvm-commits, sunfish, jfb Differential Revision: http://reviews.llvm.org/D11927 llvm-svn: 244562	2015-08-11 02:45:15 +00:00
Cameron Esfahani	0c35a7deea	Explicitly clear the MI operand list when getInstruction() is called. Call MI.clear() within MCD::OPC_Decode case and inside of translateInstruction() for the X86 target. Remove now unnecessary MI.clear() from ARMDisassembler. Summary: Explicitly clear the MI operand list when getInstruction() is called. Reviewers: hfinkel, t.p.northover, hvarga, kparzysz, jyknight, qcolombet, uweigand Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D11665 llvm-svn: 244557	2015-08-11 01:15:07 +00:00
JF Bastien	6198dac24e	WebAssembly: simply assert on SNaN and NaNs with payloads Summary: convertToHexString doesn't represent them correctly at this point in time. This is a follow-up to sunfish's suggestion in D11914. Subscribers: llvm-commits, sunfish, jfb Differential Revision: http://reviews.llvm.org/D11925 llvm-svn: 244551	2015-08-11 00:49:20 +00:00

1 2 3 4 5 ...

34164 Commits