llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-21 12:02:58 +02:00

Author	SHA1	Message	Date
Matt Arsenault	524b24258c	AMDGPU: Add queue ptr intrinsic llvm-svn: 267451	2016-04-25 19:27:18 +00:00
Krzysztof Parzyszek	7bc31501a1	[Hexagon] Register save/restore functions do not follow regular conventions Do not mark them as modifying any of the volatile registers by default. llvm-svn: 267433	2016-04-25 17:49:44 +00:00
Sanjay Patel	2bdc7d43ef	add tests for potential CGP transform (PR27344) llvm-svn: 267426	2016-04-25 16:56:52 +00:00
Marcin Koscielnicki	de3ced2d10	[PR27390] [CodeGen] Reject indexed loads in CombinerDAG. visitAND, when folding and (load) forgets to check which output of an indexed load is involved, happily folding the updated address output on the following testcase: target datalayout = "e-m:e-i64:64-n32:64" target triple = "powerpc64le-unknown-linux-gnu" %typ = type { i32, i32 } define signext i32 @_Z8access_pP1Tc(%typ* %p, i8 zeroext %type) { %b = getelementptr inbounds %typ, %typ* %p, i64 0, i32 1 %1 = load i32, i32* %b, align 4 %2 = ptrtoint i32* %b to i64 %3 = and i64 %2, -35184372088833 %4 = inttoptr i64 %3 to i32* %_msld = load i32, i32* %4, align 4 %zzz = add i32 %1, %_msld ret i32 %zzz } Fix this by checking ResNo. I've found a few more places that currently neglect to check for indexed load, and tightened them up as well, but I don't have test cases for them. In fact, they might not be triggerable at all, at least with current targets. Still, better safe than sorry. Differential Revision: http://reviews.llvm.org/D19202 llvm-svn: 267420	2016-04-25 15:43:44 +00:00
Hrvoje Varga	e25aaa57ef	[mips][microMIPS] Revert commit r267137 Commit r267137 was the reason for failing tests in LLVM test suite. llvm-svn: 267419	2016-04-25 15:40:08 +00:00
Sanjay Patel	74d1b00e49	[x86] auto-generate checks for cmov tests llvm-svn: 267417	2016-04-25 15:26:57 +00:00
David Majnemer	c78d15d0c3	[WinEH] Update SplitAnalysis::computeLastSplitPoint to cope with multiple EH successors We didn't have logic to correctly handle CFGs where there was more than one EH-pad successor (these are novel with WinEH). There were situations where a register was live in one exceptional successor but not another but the code as written would only consider the first exceptional successor it found. This resulted in split points which were insufficiently early if an invoke was present. This fixes PR27501. N.B. This removes getLandingPadSuccessor. llvm-svn: 267412	2016-04-25 14:31:32 +00:00
Silviu Baranga	6c665bec7d	[ARM] Add support for the X asm constraint Summary: This patch adds support for the X asm constraint. To do this, we lower the constraint to either a "w" or "r" constraint depending on the operand type (both constraints are supported on ARM). Fixes PR26493 Reviewers: t.p.northover, echristo, rengolin Subscribers: joker.eph, jgreenhalgh, aemerson, rengolin, llvm-commits Differential Revision: http://reviews.llvm.org/D19061 llvm-svn: 267411	2016-04-25 14:29:18 +00:00
Artem Tamazov	7fa01faba1	[AMDGPU][llvm-mc] s_getreg/setreg* - Add hwreg(...) syntax. Added hwreg(reg[,offset,width]) syntax. Default offset = 0, default width = 32. Possibility to specify 16-bit immediate kept. Added out-of-range checks. Disassembling is always to hwreg(...) format. Tests updated/added. Differential Revision: http://reviews.llvm.org/D19329 llvm-svn: 267410	2016-04-25 14:13:51 +00:00
Marcin Koscielnicki	5a59940bb3	[PowerPC] [PR27387] Disallow r0 for ADD8TLS. ADD8TLS, a variant of add instruction used for initial-exec TLS, currently accepts r0 as a source register. While add itself supports r0 just fine, linker can relax it to a local-exec sequence, converting it to addi - which doesn't support r0. Differential Revision: http://reviews.llvm.org/D19193 llvm-svn: 267388	2016-04-25 09:24:34 +00:00
Michael Zuckerman	4cc752e542	Fixing wrong mask size error. From __mmask8 to __mmask16. Was reviewed over the shoulder by AsafBadouh. Connected to review http://reviews.llvm.org/D19195. llvm-svn: 267379	2016-04-25 05:27:51 +00:00
Craig Topper	65fd322915	[X86] Add a complete set of tests for all operand sizes of cttz/ctlz with and without zero undef being lowered to bsf/bsr. llvm-svn: 267373	2016-04-25 01:01:15 +00:00
Simon Pilgrim	386d31614a	[X86][AVX] Added PR24935 test case llvm-svn: 267362	2016-04-24 20:30:48 +00:00
Saleem Abdulrasool	13a27b5a96	ARM: fix __chkstk Frame Setup on WoA This corrects the MI annotations for the stack adjustment following the __chkstk invocation. We were marking the original SP usage as a Def rather than Kill. The (new) assigned value is the definition, the original reference is killed. Adjust the ISelLowering to mark Kills and FrameSetup as well. This partially resolves PR27480. llvm-svn: 267361	2016-04-24 20:12:48 +00:00
Simon Pilgrim	30bc5e06bf	[X86][SSE] Added SSSE3/AVX/AVX2 BITREVERSE tests Codegen is pretty bad at the moment but could use PSHUFB quite efficiently llvm-svn: 267347	2016-04-24 15:45:06 +00:00
Simon Pilgrim	9549c9fa45	[X86][XOP] Fixed VPPERM permute op decoding (PR27472). Fixed issue with VPPERM target shuffle mask decoding that was incorrectly masking off the 3-bit permute op with a 2-bit mask. llvm-svn: 267346	2016-04-24 15:05:04 +00:00
Simon Pilgrim	1211a431ae	[X86][SSE] Improved support for decoding target shuffle masks through bitcasts Reused the ability to split constants of a type wider than the shuffle mask to work with masks generated from scalar constants transfered to xmm. This fixes an issue preventing PSHUFB target shuffle masks decoding rematerialized scalar constants and also exposes the XOP VPPERM bug described in PR27472. llvm-svn: 267343	2016-04-24 14:53:54 +00:00
Marcin Koscielnicki	6b999dbbc0	[SystemZ] [SSP] Add support for LOAD_STACK_GUARD. This fixes PR22248 on s390x. The previous attempt at this was D19101, which was before LOAD_STACK_GUARD existed. Compared to the previous version, this always emits a rather ugly block of 4 instructions, involving a thread pointer load that can't be shared with other potential users. However, this is necessary for SSP - spilling the guard value (or thread pointer used to load it) is counter to the goal, since it could be overwritten along with the frame it protects. Differential Revision: http://reviews.llvm.org/D19363 llvm-svn: 267340	2016-04-24 13:57:49 +00:00
Simon Pilgrim	79561eb759	[X86][SSE] Demonstrate issue with decoding shuffle masks that have been lowered as rematerialized constants on scalar unit Found whilst investigating PR27472 llvm-svn: 267339	2016-04-24 13:45:30 +00:00
Gerolf Hoflehner	f601ce3709	[MachineCombiner] Support for floating-point FMA on ARM64 (re-commit r267098) The original patch caused crashes because it could derefence a null pointer for SelectionDAGTargetInfo for targets that do not define it. Evaluates fmul+fadd -> fmadd combines and similar code sequences in the machine combiner. It adds support for float and double similar to the existing integer implementation. The key features are: - DAGCombiner checks whether it should combine greedily or let the machine combiner do the evaluation. This is only supported on ARM64. - It gives preference to throughput over latency: the heuristic used is to combine always in loops. The targets decides whether the machine combiner should optimize for throughput or latency. - Supports for fmadd, f(n)msub, fmla, fmls patterns - On by default at O3 ffast-math llvm-svn: 267328	2016-04-24 05:14:01 +00:00
Craig Topper	917b9d7a47	[X86] Fix patterns that turn cmove/cmovne+ctlz/cttz into lzcnt/tzcnt instructions. Only one of the conditions should be valid for each pattern, not both. Update tests accordingly. llvm-svn: 267311	2016-04-24 02:01:22 +00:00
Duncan P. N. Exon Smith	94512f24de	DebugInfo: Remove MDString-based type references Eliminate DITypeIdentifierMap and make DITypeRef a thin wrapper around DIType*. It is no longer legal to refer to a DICompositeType by its 'identifier:', and DIBuilder no longer retains all types with an 'identifier:' automatically. Aside from the bitcode upgrade, this is mainly removing logic to resolve an MDString-based reference to an actualy DIType. The commits leading up to this have made the implicit type map in DICompileUnit's 'retainedTypes:' field superfluous. This does not remove DITypeRef, DIScopeRef, DINodeRef, and DITypeRefArray, or stop using them in DI-related metadata. Although as of this commit they aren't serving a useful purpose, there are patchces under review to reuse them for CodeView support. The tests in LLVM were updated with deref-typerefs.sh, which is attached to the thread "[RFC] Lazy-loading of debug info metadata": http://lists.llvm.org/pipermail/llvm-dev/2016-April/098318.html llvm-svn: 267296	2016-04-23 21:08:00 +00:00
Renato Golin	d3f349208c	Revert "[AArch64] Fix optimizeCondBranch logic." This reverts commit r267206, as it broke self-hosting on AArch64. llvm-svn: 267294	2016-04-23 19:30:52 +00:00
Simon Pilgrim	581d3c8ed8	[X86][XOP] Added VPPERM -> BLEND-WITH-ZERO Test Currently failing due to poor blend matching, found whilst investigating PR27472 llvm-svn: 267282	2016-04-23 11:14:18 +00:00
Craig Topper	aebe1f85c0	[CodeGen] When promoting CTTZ operations to larger type, don't insert a select to detect if the input is zero to return the original size instead of the extended size. Instead just set the first bit in the zero extended part. llvm-svn: 267280	2016-04-23 05:20:47 +00:00
Matt Arsenault	3cbb6d7d74	AMDGPU: sext_inreg (srl x, K), vt -> bfe x, K, vt.Size llvm-svn: 267244	2016-04-22 22:59:16 +00:00
NAKAMURA Takumi	2e521b8375	Fix llvm/test/CodeGen/ARM/Windows/dbzchk.ll not to check mixed output, take #2 . llvm-svn: 267242	2016-04-22 22:51:48 +00:00
Matt Arsenault	b274219b5d	AMDGPU: Re-visit nodes in performAndCombine This fixes test regressions when i64 loads/stores are made promote. llvm-svn: 267240	2016-04-22 22:48:38 +00:00
Sriraman Tallam	bdcd2a52a2	Differential Revision: http://reviews.llvm.org/D19040 llvm-svn: 267229	2016-04-22 21:41:58 +00:00
Peter Collingbourne	d04766ba20	Introduce llvm.load.relative intrinsic. This intrinsic takes two arguments, ``%ptr`` and ``%offset``. It loads a 32-bit value from the address ``%ptr + %offset``, adds ``%ptr`` to that value and returns it. The constant folder specifically recognizes the form of this intrinsic and the constant initializers it may load from; if a loaded constant initializer is known to have the form ``i32 trunc(x - %ptr)``, the intrinsic call is folded to ``x``. LLVM provides that the calculation of such a constant initializer will not overflow at link time under the medium code model if ``x`` is an ``unnamed_addr`` function. However, it does not provide this guarantee for a constant initializer folded into a function body. This intrinsic can be used to avoid the possibility of overflows when loading from such a constant. Differential Revision: http://reviews.llvm.org/D18367 llvm-svn: 267223	2016-04-22 21:18:02 +00:00
Matt Arsenault	953215125b	DAGCombiner: Relax alignment restriction when changing store type If the target allows the alignment, this should be OK. llvm-svn: 267217	2016-04-22 21:01:41 +00:00
Peter Collingbourne	df3f55c3ab	CodeGen: Use PLT relocations for relative references to unnamed_addr functions. The relative vtable ABI (PR26723) needs PLT relocations to refer to virtual functions defined in other DSOs. The unnamed_addr attribute means that the function's address is not significant, so we're allowed to substitute it with the address of a PLT entry. Also includes a bonus feature: addends for COFF image-relative references. Differential Revision: http://reviews.llvm.org/D17938 llvm-svn: 267211	2016-04-22 20:40:10 +00:00
Matt Arsenault	9b242e0d0e	DAGCombiner: Relax alignment restriction when changing load type If the target allows the alignment, this should still be OK. llvm-svn: 267209	2016-04-22 20:21:36 +00:00
Quentin Colombet	dfaf1211a5	[AArch64] Fix optimizeCondBranch logic. The opcode for the optimized branch does not depend on the size of the activate bits in the AND masks, but the AND opcode itself. Indeed, we need to use a X or W variant based on the AND variant not based on whether the mask fits into the related variant. Otherwise, we may end up using the W variant of the optimized branch for 64-bit register inputs! This fixes the last make check verifier issues for AArch64: PR27479. llvm-svn: 267206	2016-04-22 20:09:58 +00:00
Matthias Braun	97996c46e1	MachineScheduler: Limit the size of the ready list. Avoid quadratic complexity in unusually large basic blocks by limiting the size of the ready lists. Differential Revision: http://reviews.llvm.org/D19349 llvm-svn: 267189	2016-04-22 19:09:17 +00:00
Quentin Colombet	e89cf71564	[AArch64] When creating MRS instruction, make sure the destination register is declared as a definition. This fixes the machine verifier error for CodeGen/AArch64/nzcv-save.ll. llvm-svn: 267185	2016-04-22 18:46:17 +00:00
Quentin Colombet	e4d08c7e23	[AArch64][AdvSIMDScalar] Update the kill flags correctly. We used to simply set the kill flags to true when transforming a scalar instruction to a vector one. SrcScalar1 = copy SrcVector1 ... = opScalar SrcScalar1 => SrcScalar1 = copy SrcVector1 ... = opVector SrcVector1<kill> This is obviously wrong. The proper update consists in: 1. Propagate the kill status from the copy to the new opVector 2. Reset the kill status on the copy, since the live-range of SrcVector1 got extended. This fixes some of the machine verifier errors for AArch64 with make check. llvm-svn: 267180	2016-04-22 18:09:14 +00:00
Saleem Abdulrasool	deec7429ad	test: split test into two runs Rather than checking both stdout and stderr simultaneously, split it into two tests. This apparently breaks on Windows where MSVCRT does not buffer output correctly. NFC. Thanks to chapuni for bringing the issue to my attention! llvm-svn: 267179	2016-04-22 18:06:51 +00:00
Krzysztof Parzyszek	78673d03bc	[Hexagon] Properly close live range in HexagonBlockRanges ---add testcase llvm-svn: 267174	2016-04-22 17:30:13 +00:00
Konstantin Zhuravlyov	613d1574ee	[AMDGPU] Insert nop pass: take care of outstanding feedback - Switch few loops to range-based for loops - Fix nop insertion at the end of BB - Fix formatting - Check for endpgm Differential Revision: http://reviews.llvm.org/D19380 llvm-svn: 267167	2016-04-22 17:04:51 +00:00
Krzysztof Parzyszek	e1e99be481	[Hexagon] Teach mux expansion how to deal with undef predicates llvm-svn: 267165	2016-04-22 16:47:01 +00:00
Nirav Dave	2e439346e9	Emit code16 in assembly in 16-bit mode Summary: When generating assembly using -m16 we must explicitly mark it as 16-bit. Emit .code16 at beginning of file. Fixes wrong results when using -fno-integrated-as. Reviewers: dwmw2 Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D19392 llvm-svn: 267152	2016-04-22 13:36:11 +00:00
Simon Dardis	c64ea03044	[mips] Fix select patterns for MIPS64 When targetting MIPS64R6 some of the patterns for select were guarded by a broken predicate. The predicate was supposed to test if a constant value could fit in a 16 bit zero-extended field. Instead the value was tested to fit in a 16 bit sign-extended field. For negative constants of native word width this resulted in wrong code generation. Reviewers: vkalintiris, dsanders Differential Review: http://reviews.llvm.org/D19378 llvm-svn: 267151	2016-04-22 13:19:22 +00:00
Hrvoje Varga	0102a446f7	[mips][microMIPS] Implement SLT, SLTI, SLTIU, SLTU microMIPS32r6 instructions Differential Revision: http://reviews.llvm.org/D19354 llvm-svn: 267137	2016-04-22 11:18:40 +00:00
Daniel Sanders	ffe901fc26	Revert r267098 - [MachineCombiner] Support for floating-point FMA on ARM64 It introduced buildbot failures on clang-cmake-mips, clang-ppc64le-linux, among others. llvm-svn: 267127	2016-04-22 09:37:26 +00:00
Nicolai Haehnle	f54e57a212	AMDGPU/SI: add llvm.amdgcn.ps.live intrinsic Summary: This intrinsic returns true if the current thread belongs to a live pixel and false if it belongs to a pixel that we are executing only for derivative computation. It will be used by Mesa to implement gl_HelperInvocation. Note that for pixels that are killed during the shader, this implementation also returns true, but it doesn't matter because those pixels are always disabled in the EXEC mask. This unearthed a corner case in the instruction verifier, which complained about a v_cndmask 0, 1, exec, exec<imp-use> instruction. That's stupid but correct code, so make the verifier accept it as such. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19191 llvm-svn: 267102	2016-04-22 04:04:08 +00:00
Craig Topper	6ea4b97a42	[AVX512] Teach lowering to use vplzcntd/q to implement 128/256-bit CTTZ_ZERO_UNDEF even without VLX support. We can just extend to 512-bits and extract like we do for CTLZ. llvm-svn: 267100	2016-04-22 03:22:38 +00:00
Gerolf Hoflehner	d63bafa58b	[MachineCombiner] Support for floating-point FMA on ARM64 Evaluates fmul+fadd -> fmadd combines and similar code sequences in the machine combiner. It adds support for float and double similar to the existing integer implementation. The key features are: - DAGCombiner checks whether it should combine greedily or let the machine combiner do the evaluation. This is only supported on ARM64. - It gives preference to throughput over latency: the heuristic used is to combine always in loops. The targets decides whether the machine combiner should optimize for throughput or latency. - Supports for fmadd, f(n)msub, fmla, fmls patterns - On by default at O3 ffast-math llvm-svn: 267098	2016-04-22 02:15:19 +00:00
Nico Weber	84e1414d06	Try to fix UNRESOLVED: LLVM :: CodeGen/AArch64/arm64-regress-opt-cmp.s on bots. This test used to write a .s file until r266971 fixed that. But on most bots, the .s file still exists. Add an rm statement to clean up the bots. In a few days, this statement can go away again. llvm-svn: 267095	2016-04-22 01:08:56 +00:00
Saleem Abdulrasool	eed028bbcd	ARM: fix test for Windows division This was meant to be part of SVN r267080. cbz cannot use a high register, which would be silently truncated. This has now been fixed. llvm-svn: 267092	2016-04-22 01:03:38 +00:00
Dan Gohman	fdbfed8f83	[WebAssembly] Limit alignment hints to natural alignment. This follows the current binary format rules. llvm-svn: 267082	2016-04-21 23:59:48 +00:00
Saleem Abdulrasool	4fb09b352f	ARM: restrict register class for WIN__DBZCHK WIN__DBZCHK will insert a CBZ instruction into the stream. This instruction reserves 3 bits for the condition register (rn). As such, we must ensure that we restrict the register to a low register. Use the tGPR class instead of GPR to ensure that this is properly constrained. In debug builds, we would attempt to use lr as a condition register which would silently get truncated with no hint that the register selection was incorrect. llvm-svn: 267080	2016-04-21 23:53:19 +00:00
Sanjay Patel	081b99ba47	add tests for disguised fabs/fneg llvm-svn: 267053	2016-04-21 21:02:25 +00:00
Sanjay Patel	67f5c38cab	use FileCheck; add test for disguised fabs llvm-svn: 267051	2016-04-21 20:58:58 +00:00
Krzysztof Parzyszek	b6d9f909f0	[Hexagon] Expand handling of the small-data/bss section llvm-svn: 267034	2016-04-21 18:56:45 +00:00
Matt Arsenault	a95bd86b3c	DAGCombiner: Reduce 64-bit BFE pattern to pattern on 32-bit component If the extracted bits are restricted to the upper half or lower half, this can be truncated. llvm-svn: 267024	2016-04-21 18:03:06 +00:00
Marcin Koscielnicki	452981873a	[PowerPC] [SSP] Fix stack guard load for 32-bit. r266809 incorrectly used LD to load the stack guard, it should be LWZ. Differential Revision: http://reviews.llvm.org/D19358 llvm-svn: 267017	2016-04-21 17:36:05 +00:00
Evgeny Astigeevich	8e76562ba2	Updated a test not to produce an empty s-file. llvm-svn: 266971	2016-04-21 09:36:49 +00:00
Evgeny Astigeevich	4405b74d21	[AArch64][CodeGen] Fix of PR27158: incorrect peephole optimization in AArch64InstrInfo::optimizeCompareInstr AArch64InstrInfo::optimizeCompareInstr has bug PR27158 which causes generation of incorrect code. A compare instruction is substituted with another instruction which does not produce the same flags as the original compare instruction. This patch contains: 1. Fix of the bug. 2. A regression test in MIR. 3. A new test to check that SUBS is replaced by SUB. Differential Revision: http://reviews.llvm.org/D18838 llvm-svn: 266969	2016-04-21 08:54:08 +00:00
Craig Topper	b64a162a22	[AVX512] Add CTTZ support for v8i64 and v16i32 vectors. llvm-svn: 266968	2016-04-21 07:30:06 +00:00
Craig Topper	22948cc4a9	[X86] Fix vector-tzcnt-512 test to disable CDI while enabling BWI for one of the runs. Update check patterns accordingly. llvm-svn: 266967	2016-04-21 07:30:03 +00:00
Craig Topper	1c2be5dcd4	Fix test command line to explicitly disable CDI instructions for one test. llvm-svn: 266966	2016-04-21 07:29:59 +00:00
Craig Topper	dc636097f0	[AVX512] Add support for lowering CTTZ v64i8 and v32i16 with BWI instructions. llvm-svn: 266963	2016-04-21 06:39:34 +00:00
Craig Topper	7e74d3bc64	[AVX512] Add support for popcount of v8i64 and v16i32 with and without BWI instructions. Without BWI we have to split the vectors into 256-bit vectors so we can use AVX2 pshufb and then concatenate the results. llvm-svn: 266950	2016-04-21 03:57:24 +00:00
Jacques Pienaar	0cb285e7fd	[lanai] Add subword scheduling itineraries. Differentiate between word and subword memory operations as they take different amount of cycles to complete. This just adds a basic model of the subword latency to the scheduler. llvm-svn: 266898	2016-04-20 18:28:55 +00:00
Krzysztof Parzyszek	17749987f0	[RDF] Consider register as live if any alias is live This only affects the recomputation of kill flags. llvm-svn: 266875	2016-04-20 14:33:23 +00:00
Asaf Badouh	9cc5def4a3	[X86] enable PIE for functions Call locally defined function directly for PIE/fPIE Differential Revision: http://reviews.llvm.org/D19226 llvm-svn: 266863	2016-04-20 08:32:57 +00:00
Craig Topper	6c544565e4	[AVX512] Add avx512cd+vl runs to vector-tzcnt-128/256 tests to show using the vplzcntd/q instructions. llvm-svn: 266860	2016-04-20 05:19:01 +00:00
Craig Topper	73cbc9c236	[AVX512] Update vector-tzcnt-512 test to show how bad v32i16 and v64i8 is with avx512bw enabled. llvm-svn: 266859	2016-04-20 05:18:58 +00:00
Craig Topper	f480c8bddb	[AVX512] Add popcount support for v32i16 and v64i8. llvm-svn: 266858	2016-04-20 05:18:55 +00:00
Marcin Koscielnicki	64bfaf0336	[SystemZ] Add support for llvm.thread.pointer intrinsic. Differential Revision: http://reviews.llvm.org/D19054 llvm-svn: 266844	2016-04-20 01:03:48 +00:00
Mandeep Singh Grang	28ddad394b	[LLVM] Remove unwanted --check-prefix=CHECK from unit tests. NFC. Summary: Removed unwanted --check-prefix=CHECK from numerous unit tests. Reviewers: t.p.northover, dblaikie, uweigand, MatzeB, tstellarAMD, mcrosier Subscribers: mcrosier, dsanders Differential Revision: http://reviews.llvm.org/D19279 llvm-svn: 266834	2016-04-19 23:51:52 +00:00
Tim Northover	8ead88dca1	ARM: fix assertion failure on -O0 cmpxchg. Because lowering of CMP_SWAP_64 occurs during type legalization, there can be i64 types produced by more than just a BUILD_PAIR or similar. My initial tests used just incoming function args. llvm-svn: 266828	2016-04-19 22:25:02 +00:00
Nicolai Haehnle	0c7a341af5	Add IntrWrite[Arg]Mem intrinsic property Summary: This property is used to mark an intrinsic that only writes to memory, but neither reads from memory nor has other side effects. An example where this is useful is the llvm.amdgcn.buffer.store.format.* intrinsic, which corresponds to a store instruction that goes through a special buffer descriptor rather than through a plain pointer. With this property, the intrinsic should still be handled as having side effects at the LLVM IR level, but machine scheduling can make smarter decisions. Reviewers: tstellarAMD, arsenm, joker.eph, reames Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18291 llvm-svn: 266826	2016-04-19 21:58:33 +00:00
Nicolai Haehnle	e4cebbe0dc	AMDGPU: Guard VOPC instructions against incorrect commute Summary: The added testcase, which triggered this, was derived from a shader-db case via bugpoint. A separate question is why scalar branching wasn't used. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19208 llvm-svn: 266825	2016-04-19 21:58:22 +00:00
Krzysztof Parzyszek	6a86876079	[Hexagon] Fix operand swapping in HexagonPeephole Also, disable zero- and size-extend optimizations for now. llvm-svn: 266821	2016-04-19 21:36:24 +00:00
Marcin Koscielnicki	0919e582e6	[AArch64] [ARM] Make a target-independent llvm.thread.pointer intrinsic. Both AArch64 and ARM support llvm.<arch>.thread.pointer intrinsics that just return the thread pointer. I have a pending patch that does the same for SystemZ (D19054), and there are many more targets that could benefit from one. This patch merges the ARM and AArch64 intrinsics into a single target independent one that will also be used by subsequent targets. Differential Revision: http://reviews.llvm.org/D19098 llvm-svn: 266818	2016-04-19 20:51:05 +00:00
Krzysztof Parzyszek	5607676c28	[Hexagon] Fix printing the address operand of S2_storerinewabs llvm-svn: 266811	2016-04-19 20:20:33 +00:00
Tim Shen	a12f27cb73	[PPC, SSP] Support PowerPC Linux stack protection. llvm-svn: 266809	2016-04-19 20:14:52 +00:00
Tim Shen	3a75cd4bf9	[SSP, 2/2] Create llvm.stackguard() intrinsic and lower it to LOAD_STACK_GUARD With this change, ideally IR pass can always generate llvm.stackguard call to get the stack guard; but for now there are still IR form stack guard customizations around (see getIRStackGuard()). Future SSP customization should go through LOAD_STACK_GUARD. There is a behavior change: stack guard values are not CSEed anymore, since we should never reuse the value in case that it has been spilled (and corrupted). See ssp-guard-spill.ll. This also cause the change of stack size and codegen in X86 and AArch64 test cases. Ideally we'd like to know if the guard created in llvm.stackprotector() gets spilled or not. If the value is spilled, discard the value and reload stack guard; otherwise reuse the value. This can be done by teaching register allocator to know how to rematerialize LOAD_STACK_GUARD and force a rematerialization (which seems hard), or check for spilling in expandPostRAPseudo. It only makes sense when the stack guard is a global variable, which requires more instructions to load. Anyway, this seems to go out of the scope of the current patch. llvm-svn: 266806	2016-04-19 19:40:37 +00:00
Jacques Pienaar	88ae65cfbd	[lanai] Add lowering for SETCCE i32. * Add lowering for SETCCE i32. * Add test to check lowering of i64 compares uses SETCCE expansion (outside of EQ and NE). * Fix select.ll test and immediate form selection for RI operations. llvm-svn: 266802	2016-04-19 19:15:25 +00:00
Simon Pilgrim	e1392bc92c	[X86][AVX2] Prefer VPERMQ/VPERMPD over VINSERTI128/VINSERTF128 for unary shuffles Using VPERMQ/VPERMPD allows memory folding of the (repeated) input where VINSERTI128/VINSERTF128 can not. Differential Revision: http://reviews.llvm.org/D19228 llvm-svn: 266728	2016-04-19 12:26:40 +00:00
Sanjoy Das	91fd65c3a6	Introduce a "patchable-function" function attribute Summary: The `"patchable-function"` attribute can be used by an LLVM client to influence LLVM's code generation in ways that makes the generated code easily patchable at runtime (for instance, to redirect control). Right now only one patchability scheme is supported, `"prologue-short-redirect"`, but this can be expanded in the future. Reviewers: joker.eph, rnk, echristo, dberris Subscribers: joker.eph, echristo, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D19046 llvm-svn: 266715	2016-04-19 05:24:47 +00:00
Tim Northover	5f6de253c5	ARM: use a pseudo-instruction for cmpxchg at -O0. The fast register-allocator cannot cope with inter-block dependencies without spilling. This is fine for ldrex/strex loops coming from atomicrmw instructions where any value produced within a block is dead by the end, but not for cmpxchg. So we lower a cmpxchg at -O0 via a pseudo-inst that gets expanded after regalloc. Fortunately this is at -O0 so we don't have to care about performance. This simplifies the various axes of expansion considerably: we assume a strong seq_cst operation and ensure ordering via the always-present DMB instructions rather than v8 acquire/release instructions. Should fix the 32-bit part of PR25526. llvm-svn: 266679	2016-04-18 21:48:55 +00:00
Simon Pilgrim	72df26628e	[X86][SSE] Test case for PR2585 llvm-svn: 266669	2016-04-18 21:07:49 +00:00
Simon Pilgrim	2ca1c20043	[X86][AVX] Added extra memory folding tests for D19228 llvm-svn: 266662	2016-04-18 19:48:16 +00:00
Simon Pilgrim	3b3d060fd0	[X86][AVX] Added zero+blend vs vperm2f128 optsize tests cases (PR22984) We should be trying to use vperm2f128 instead of zero+blend (if we're only the user of zero?) when optsize is enabled. llvm-svn: 266632	2016-04-18 17:14:04 +00:00
Konstantin Zhuravlyov	809d827c49	[AMDGPU] Add insert nops pass based on subtarget features instead of cl::opt Also, - Skip pass if machine module does not have debug info - Minor comment changes - Added test Differential Revision: http://reviews.llvm.org/D19079 llvm-svn: 266626	2016-04-18 16:28:23 +00:00
Simon Pilgrim	d93cf6e183	[X86][AVX] Renamed vperm2f128 test to make it quicker to review missed one the first time round... llvm-svn: 266623	2016-04-18 16:08:19 +00:00
Simon Pilgrim	cb2fac28c3	[X86][AVX] Renamed vperm2f128 tests to make it quicker to review llvm-svn: 266621	2016-04-18 15:37:45 +00:00
Strahinja Petrovic	19968b2801	[PowerPC] add comment to test Added comment in test for soft-float operations on ppc architecture. Test commit. llvm-svn: 266600	2016-04-18 11:52:14 +00:00
Simon Pilgrim	bb26e7370b	[X86][SSE] Added 16i8 -> 8i64 sext test Shows poor codegen for AVX2 llvm-svn: 266560	2016-04-17 15:10:42 +00:00
Craig Topper	bb00d9af19	[AVX512] ISD::MUL v2i64/v4i64 should only be legal if DQI and VLX features are enabled. llvm-svn: 266554	2016-04-17 07:25:39 +00:00
Simon Pilgrim	fc44bd8574	[X86][AVX] Add shuffle combine tests for MOVDDUP/MOVSHDUP/MOVSLDUP 128, 256 and 512 bit implementations (some not yet supported by combineX86ShuffleChain) llvm-svn: 266535	2016-04-16 20:30:59 +00:00
Simon Pilgrim	cd3aca2121	[X86][XOP] Added VPPERM constant mask decoding and target shuffle combining support Added additional test that peeks through bitcast to v16i8 mask llvm-svn: 266533	2016-04-16 17:52:07 +00:00
Simon Pilgrim	8fc00dfd01	[X86][XOP] More VPPERM shuffle mask decode tests As requested by D18441 llvm-svn: 266531	2016-04-16 16:37:21 +00:00
Matt Arsenault	ff544fe603	AMDGPU: Enable LocalStackSlotAllocation pass This resolves more frame indexes early and folds the immediate offsets into the scratch mubuf instructions. This cleans up a lot of the mess that's currently emitted, such as emitting add 0s and repeatedly initializing the same register to 0 when spilling. llvm-svn: 266508	2016-04-16 02:13:37 +00:00
Matt Arsenault	5d84ff0690	AMDGPU: Use s_addk_i32 / s_mulk_i32 llvm-svn: 266506	2016-04-16 01:46:49 +00:00
Wei Mi	2936fb2655	Don't skip splitSeparateComponents in eliminateDeadDefs for HoistSpillHelper::hoistAllSpills. Because HoistSpillHelper::hoistAllSpills is called in postOptimization, before the patch we didn't want LiveRangeEdit::eliminateDeadDefs to call splitSeparateComponents and generate unassigned new vregs. However, skipping splitSeparateComponents will make verify-machineinstrs unhappy, so I remove the early return, and use HoistSpillHelper::LRE_DidCloneVirtReg to assign physreg/stackslot for those new vregs. In addition, some code reorganization to make class HoistSpillHelper privately inheriting from LiveRangeEdit::Delegate possible. This is to be consistent with class RAGreedy and class RegisterCoalescer. Differential Revision: http://reviews.llvm.org/D19142 llvm-svn: 266489	2016-04-15 23:16:44 +00:00
Hans Wennborg	dc4376ad48	Switch lowering: don't add incoming PHI values from skipped bit test MBB's (PR27135) After r245976, LLVM will skip the last bit test case if knows it will always be true. However, we would still erroneously update PHI nodes with incoming values from the MBB that would perform the final bit test, causing -verify-machineinstrs to fail. llvm-svn: 266479	2016-04-15 21:45:30 +00:00
Adrian Prantl	d83fea5735	Let the DISubprogram in this test point to the right compile unit. llvm-svn: 266468	2016-04-15 19:38:14 +00:00
Adrian Prantl	45487c07fc	Update testcase to new debug metadata format. llvm-svn: 266467	2016-04-15 19:32:22 +00:00
Chad Rosier	8ec20feca4	[AArch64] Add load/store pair instructions to getMemOpBaseRegImmOfsWidth(). This improves AA in the MI schduler when reason about paired instructions. Phabricator Revision: http://reviews.llvm.org/D17098 PR26358 llvm-svn: 266462	2016-04-15 18:09:10 +00:00
Marcin Koscielnicki	f442630636	[SystemZ] Fix large tests broken by conditional returns. These were broken by D17339. Differential Revision: http://reviews.llvm.org/D19158 llvm-svn: 266454	2016-04-15 17:24:40 +00:00
Geoff Berry	79ebd7fb73	Fix test to require Asserts since it uses debug output. llvm-svn: 266448	2016-04-15 16:09:00 +00:00
Adrian Prantl	fb3abba237	[PR27284] Reverse the ownership between DICompileUnit and DISubprogram. Currently each Function points to a DISubprogram and DISubprogram has a scope field. For member functions the scope is a DICompositeType. DIScopes point to the DICompileUnit to facilitate type uniquing. Distinct DISubprograms (with isDefinition: true) are not part of the type hierarchy and cannot be uniqued. This change removes the subprograms list from DICompileUnit and instead adds a pointer to the owning compile unit to distinct DISubprograms. This would make it easy for ThinLTO to strip unneeded DISubprograms and their transitively referenced debug info. Motivation ---------- Materializing DISubprograms is currently the most expensive operation when doing a ThinLTO build of clang. We want the DISubprogram to be stored in a separate Bitcode block (or the same block as the function body) so we can avoid having to expensively deserialize all DISubprograms together with the global metadata. If a function has been inlined into another subprogram we need to store a reference the block containing the inlined subprogram. Attached to https://llvm.org/bugs/show_bug.cgi?id=27284 is a python script that updates LLVM IR testcases to the new format. http://reviews.llvm.org/D19034 <rdar://problem/25256815> llvm-svn: 266446	2016-04-15 15:57:41 +00:00
NAKAMURA Takumi	b397b15fe8	llvm/test/CodeGen/AArch64/arm64-csldst-mmo.ll requires +Asserts. llvm-svn: 266443	2016-04-15 15:37:27 +00:00
Geoff Berry	56afefb9c8	[AArch64] Add MMOs to callee-save load/store instructions. Summary: Without MMOs, the callee-save load/store instructions were treated as volatile by the MI post-RA scheduler and AArch64LoadStoreOptimizer. Reviewers: t.p.northover, mcrosier Subscribers: aemerson, rengolin, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D17661 llvm-svn: 266439	2016-04-15 15:16:19 +00:00
Nirav Dave	a36a5aeca3	Fix typing on generated LXV2DX/STXV2DX instructions [PPC] Previously when casting generic loads to LXV2DX/ST instructions we would leave the original load return type in place allowing for an assertion failure when we merge two equivalent LXV2DX nodes with different types. This fixes PR27350. Reviewers: nemanjai Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D19133 llvm-svn: 266438	2016-04-15 15:01:38 +00:00
Jun Bum Lim	ad7ab4cf46	[MachineScheduler]Add support for store clustering Perform store clustering just like load clustering. This change add StoreClusterMutation in machine-scheduler. To control StoreClusterMutation, added enableClusterStores() in TargetInstrInfo.h. This is enabled only on AArch64 for now. This change also add support for unscaled stores which were not handled in getMemOpBaseRegImmOfs(). llvm-svn: 266437	2016-04-15 14:58:38 +00:00
Nicolai Haehnle	730a184595	AMDGPU/SI: Fix regression with no-return atomics Summary: In the added test-case, the atomic instruction feeds into a non-machine CopyToReg node which hasn't been selected yet, so guard against non-machine opcodes here. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19043 llvm-svn: 266433	2016-04-15 14:42:36 +00:00
Justin Lebar	d4f9b5931f	Move divergent-target test into CodeGen/NVPTX because it requires an NVPTX target. llvm-svn: 266403	2016-04-15 01:20:52 +00:00
Matt Arsenault	18048c7ee0	AMDGPU: Include LDS size in printed comment llvm-svn: 266382	2016-04-14 22:11:51 +00:00
Matt Arsenault	2dfb6d03c5	AMDGPU: Run SIFoldOperands after PeepholeOptimizer PeepholeOptimizer cleans up redundant copies, which makes the operand folding more effective. shader-db stats: Totals: SGPRS: 34200 -> 34336 (0.40 %) VGPRS: 22118 -> 21655 (-2.09 %) Code Size: 632144 -> 633460 (0.21 %) bytes LDS: 11 -> 11 (0.00 %) blocks Scratch: 10240 -> 11264 (10.00 %) bytes per wave Max Waves: 8822 -> 8918 (1.09 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 7704 -> 7840 (1.77 %) VGPRS: 5169 -> 4706 (-8.96 %) Code Size: 234444 -> 235760 (0.56 %) bytes LDS: 2 -> 2 (0.00 %) blocks Scratch: 0 -> 1024 (0.00 %) bytes per wave Max Waves: 1188 -> 1284 (8.08 %) Wait states: 0 -> 0 (0.00 %) Increases: SGPRS: 35 (0.01 %) VGPRS: 1 (0.00 %) Code Size: 59 (0.02 %) LDS: 0 (0.00 %) Scratch: 1 (0.00 %) Max Waves: 48 (0.02 %) Wait states: 0 (0.00 %) Decreases: SGPRS: 26 (0.01 %) VGPRS: 54 (0.02 %) Code Size: 68 (0.03 %) LDS: 0 (0.00 %) Scratch: 0 (0.00 %) Max Waves: 4 (0.00 %) Wait states: 0 (0.00 %) llvm-svn: 266378	2016-04-14 21:58:24 +00:00
Matt Arsenault	e73cb153a7	AMDGPU: Fold bitcasts of scalar constants to vectors This cleans up some messes since the individual scalar components can be CSEed. llvm-svn: 266376	2016-04-14 21:58:07 +00:00
Tom Stellard	383448325b	AMDGPU: Add skeleton GlobalIsel implementation Summary: This adds the necessary target code to be able to run the ir translator. Lowering function arguments and returns is a nop and there is no support for RegBankSelect. Reviewers: arsenm, qcolombet Subscribers: arsenm, joker.eph, vkalintiris, llvm-commits Differential Revision: http://reviews.llvm.org/D19077 llvm-svn: 266356	2016-04-14 19:09:28 +00:00
Jacques Pienaar	7870fedd7e	[lanai] Add custom lowering for SRL_PARTS i32. llvm-svn: 266349	2016-04-14 17:59:22 +00:00
Nicolai Haehnle	bbd04b8c49	[DivergenceAnalysis] Treat PHI with incoming undef as constant Summary: If a PHI has an incoming undef, we can pretend that it is equal to one non-undef, non-self incoming value. This is particularly relevant in combination with the StructurizeCFG pass, which introduces PHI nodes with undefs. Previously, this lead to branch conditions that were uniform before StructurizeCFG to become non-uniform afterwards, which confused the SIAnnotateControlFlow pass. This fixes a crash when Mesa radeonsi compiles a shader from dEQP-GLES3.functional.shaders.switch.switch_in_for_loop_dynamic_vertex Reviewers: arsenm, tstellarAMD, jingyue Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D19013 llvm-svn: 266347	2016-04-14 17:42:47 +00:00
Nicolai Haehnle	014104e476	AMDGPU: Remove SIFixSGPRLiveRanges pass Summary: This pass is unnecessary and overly conservative. It was motivated by situations like def %vreg0:SGPR_32 ... if-block: .. def %vreg1:SGPR_32 ... else-block: ... use %vreg0:SGPR_32 ... and similar situations with uses after the non-uniform control flow, where we are not allowed to assign %vreg0 and %vreg1 to the same physical register, even though in the original, thread/workitem-based CFG, it looks like the live ranges of these registers do not overlap. However, by the time register allocation runs, we have moved to a wave-based CFG that accurately represents the fact that the wave may run through both the if- and the else-block. So the live ranges of %vreg0 and %vreg1 already overlap even without the SIFixSGPRLiveRanges pass. In addition to proving this change correct, I have tested it with Piglit and a small number of other tests. Reviewers: arsenm, tstellarAMD Subscribers: MatzeB, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19041 llvm-svn: 266345	2016-04-14 17:42:29 +00:00
Tim Northover	d6721e4e7e	AArch64: expand cmpxchg after regalloc at -O0. FastRegAlloc works only at the basic-block level and spills all live-out registers. Unfortunately for a stack-based cmpxchg near the spill slots, this can perpetually clear the exclusive monitor, which means the cmpxchg will never succeed. I believe the only way to handle this within LLVM is by expanding the loop post-regalloc. We don't want this in general because it severely limits the optimisations that can be done, so we limit this to -O0 compilations. It's an ugly hack, and about the one good point in the whole mess is that we can treat all cmpxchg operations in the most naive way possible (seq_cst, no clrex faff) without affecting correctness. Should fix PR25526. llvm-svn: 266339	2016-04-14 17:03:29 +00:00
Jacques Pienaar	77c93881e7	[lanai] Add areMemAccessesTriviallyDisjoint, getMemOpBaseRegImmOfs and getMemOpBaseRegImmOfsWidth. Summary: Add getMemOpBaseRegImmOfsWidth to enable determining independence during MiSched. Reviewers: eliben, majnemer Subscribers: mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D18903 llvm-svn: 266338	2016-04-14 16:47:42 +00:00
Tom Stellard	c0b7282ebc	AMDGPU: allow specifying a workgroup size that needs to fit in a compute unit Summary: For GL_ARB_compute_shader we need to support workgroup sizes of at least 1024. However, if we want to allow large workgroup sizes, we may need to use less registers, as we have to run more waves per SIMD. This patch adds an attribute to specify the maximum work group size the compiled program needs to support. It defaults, to 256, as that has no wave restrictions. Reducing the number of registers available is done similarly to how the registers were reserved for chips with the sgpr init bug. Reviewers: mareko, arsenm, tstellarAMD, nhaehnle Subscribers: FireBurn, kerberizer, llvm-commits, arsenm Differential Revision: http://reviews.llvm.org/D18340 Patch By: Bas Nieuwenhuizen llvm-svn: 266337	2016-04-14 16:27:07 +00:00
Tom Stellard	b9654f5566	AMDGPU/SI: Use the correct scratch wave offset register for shaders. Summary: The code previously always used s1 as it was using the user + system SGPR information for compute kernels. This is incorrect for Mesa shaders though, The register should be the next SGPR after all user and system SGPR's. We use that Mesa adds arguments for all input and system SGPR's and take the next available SGPR for the scratch wave offset register. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewers: mareko, arsenm, nhaehnle, tstellarAMD Subscribers: qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18941 Patch By: Bas Nieuwenhuizen llvm-svn: 266336	2016-04-14 16:27:03 +00:00
Simon Dardis	46f0e9cf63	Summary: Alias 'jic $reg, 0' to 'jrc $reg' and 'jialc $reg, 0' to 'jalrc $reg' like binutils. This patch was previous committed as r266055 as seemed to have caused some spurious test failures. They did not reappear after further local testing. llvm-svn: 266301	2016-04-14 13:43:17 +00:00
Vasileios Kalintiris	1ea0fc57ba	[mips] Remove duplicate tests and add missing prefixes for *-LABEL checks. NFC. Summary: The only difference between the removed tests and the pre-existing ones, is the materialization of the zero constant, which shouldn't matter for these cases. Reviewers: dsanders, sdardis Subscribers: dsanders, sdardis, llvm-commits Differential Revision: http://reviews.llvm.org/D18693 llvm-svn: 266285	2016-04-14 09:13:13 +00:00
Adam Nemet	aa40b369a6	Revert "Support arbitrary addrspace pointers in masked load/store intrinsics" This reverts commit r266086. It breaks the LTO build of gcc in SPEC2000. llvm-svn: 266282	2016-04-14 08:47:17 +00:00
David Majnemer	d2ed420815	[CodeGen] Teach LLVM how to lower @llvm.{min,max}num to {MIN,MAX}NAN The behavior of {MIN,MAX}NAN differs from that of {MIN,MAX}NUM when only one of the inputs is NaN: -NUM will return the non-NaN argument while -NAN would return NaN. It is desirable to lower to @llvm.{min,max}num to -NAN if they don't have a native instruction for -NUM. Notably, ARMv7 NEON's vmin has the -NAN semantics. N.B. Of course, it is only safe to do this if the intrinsic call is marked nnan. llvm-svn: 266279	2016-04-14 07:13:24 +00:00
Matt Arsenault	489a8fbeea	AMDGPU: Implement canonicalize Also add generic DAG node for it. llvm-svn: 266272	2016-04-14 01:42:16 +00:00
Sanjay Patel	4fd6cc9cca	[ppc] add tests to show potential andc optimization llvm-svn: 266261	2016-04-13 23:23:30 +00:00
Tim Northover	3d26dcef22	ARM: override cost function to re-enable ConstantHoisting (& fix it). At some point, ARM stopped getting any benefit from ConstantHoisting because the pass called a different variant of getIntImmCost. Reimplementing the correct variant revealed some problems, however: + ConstantHoisting was modifying switch statements. This is simply invalid, the cases must remain integer constants no matter the notional cost. + ConstantHoisting was mangling alloca instructions in the entry block. These should be handled by FrameLowering, so constants actually have a cost of 0. Worse, the resulting bitcasts meant they became dynamic allocas. rdar://25707382 llvm-svn: 266260	2016-04-13 23:08:27 +00:00
Matthias Braun	1f20820c09	ARM: Use a callee save register for the swiftself parameter. It is very likely that the swiftself parameter is alive throughout most functions function so putting it into a callee save register should avoid spills for the callers with only a minimum amount of extra spills in the callees. Currently the generated code is correct but unnecessarily spills and reloads arguments passed in callee save registers, I will address this in upcoming patches. This also adds a missing check that for tail calls the preserved value of the caller must be the same as the callees parameter. Differential Revision: http://reviews.llvm.org/D18901 llvm-svn: 266253	2016-04-13 21:43:25 +00:00
Matthias Braun	fc08c62acd	X86: Use a callee save register for the swiftself parameter. It is very likely that the swiftself parameter is alive throughout most functions function so putting it into a callee save register should avoid spills for the callers with only a minimum amount of extra spills in the callees. Currently the generated code is correct but unnecessarily spills and reloads arguments passed in callee save registers, I will address this in upcoming patches. This also adds a missing check that for tail calls the preserved value of the caller must be the same as the callees parameter. Differential Revision: http://reviews.llvm.org/D18902 llvm-svn: 266252	2016-04-13 21:43:21 +00:00
Matthias Braun	40d6ea2b7b	AArch64: Use a callee save registers for swiftself parameters It is very likely that the swiftself parameter is alive throughout most functions function so putting it into a callee save register should avoid spills for the callers with only a minimum amount of extra spills in the callees. Currently the generated code is correct but unnecessarily spills and reloads arguments passed in callee save registers, I will address this in upcoming patches. This also adds a missing check that for tail calls the preserved value of the caller must be the same as the callees parameter. Differential Revision: http://reviews.llvm.org/D19007 llvm-svn: 266251	2016-04-13 21:43:16 +00:00
Sanjay Patel	d32f647225	[x86] add tests to show potential BMI optimization llvm-svn: 266243	2016-04-13 20:40:43 +00:00
Evandro Menezes	076242a2eb	[AArch64] Disable LDP/STP for quads Disable LDP/STP for quads on Exynos M1 as they are not as efficient as pairs of regular LDR/STR. Patch by Abderrazek Zaafrani <a.zaafrani@samsung.com>. llvm-svn: 266223	2016-04-13 18:31:45 +00:00
Nirav Dave	abf596136a	Cleanup Store Merging in UseAA case This patch fixes a bug (PR26827) when using anti-aliasing in store merging. This sets the chain users of the component stores to point to the new store instead of the component stores chain parent. Reviewers: jyknight Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D18909 llvm-svn: 266217	2016-04-13 17:27:26 +00:00
Tim Northover	f9f3caeceb	AArch64: don't create instructions that write to xzr/wzr twice. These are unpredictable even on AArch64. Patch by Yichao Yu. llvm-svn: 266206	2016-04-13 16:25:39 +00:00
Artem Tamazov	ead8b434de	[AMDGPU][llvm-mc] Support of Trap Handler registers (TTMP0..11 and TBA/TMA)git status Tests added along with implemented feature. Note that there is a small leftover of unecessary MI sheduling issue (more info in the review). CodeGen/AMDGPU/salu-to-valu.ll updated to fix the false regression. TODO: Support for TTMP quads, comma-separated syntax in "[]" and more. Differential Revision: http://reviews.llvm.org/D17825 llvm-svn: 266205	2016-04-13 16:18:41 +00:00
Zoran Jovanovic	9cd420ea3a	[mips] Fix emitAtomicCmpSwapPartword to handle 64 bit pointers correctly Differential Revision: http://reviews.llvm.org/D18995 llvm-svn: 266204	2016-04-13 16:02:25 +00:00
Vasileios Kalintiris	93f6871925	[mips] Sign-extend i32 values truncated from previously zero-extended i32 values. Summary: This is a special case for MIPS64 because the architecture requires properly 32-bit sign-extended values in the register containers. Additionaly, we merge consecutive trunc + AssertZExt nodes in order to avoid unnecessary sign-extensions when the extension comes from a type smaller than i32. Reviewers: dsanders Subscribers: dsanders, sdardis, llvm-commits Differential Revision: http://reviews.llvm.org/D18893 llvm-svn: 266203	2016-04-13 15:07:45 +00:00
Simon Pilgrim	07bfcfd5f1	[X86][SSE] Regenerated vector integer absolute tests llvm-svn: 266194	2016-04-13 12:40:22 +00:00
Simon Pilgrim	8a8bff24e5	Added missing autogeneration note llvm-svn: 266185	2016-04-13 09:28:44 +00:00
Zlatko Buljan	b4097ad285	[mips][microMIPS] Add CodeGen support for DIV, MOD, DIVU, MODU, DDIV, DMOD, DDIVU and DMODU instructions Differential Revision: http://reviews.llvm.org/D17137 This patch was reverted after the revertion of dependant patch http://reviews.llvm.org/D17068. There was the problem with test-suite failure. The problem is hopefully solved with dependant patch so this patch is commited again. llvm-svn: 266179	2016-04-13 08:02:26 +00:00
Hrvoje Varga	acf02ee748	[mips][microMIPS] Fix for "Cannot copy registers" assertion Differential Revision: http://reviews.llvm.org/D17068 This changes contains fix for failing test-suite. So, this patch should hopefully work now. llvm-svn: 266171	2016-04-13 06:17:21 +00:00
Wei Mi	a92f4c62f1	Recommit r265547, and r265610,r265639,r265657 on top of it, plus two fixes with one about error verify-regalloc reported, and another about live range update of phi after rematerialization. r265547: Replace analyzeSiblingValues with new algorithm to fix its compile time issue. The patch is to solve PR17409 and its duplicates. analyzeSiblingValues is a N x N complexity algorithm where N is the number of siblings generated by reg splitting. Although it causes siginificant compile time issue when N is large, it is also important for performance since it removes redundent spills and enables rematerialization. To solve the compile time issue, the patch removes analyzeSiblingValues and replaces it with lower cost alternatives containing two parts. The first part creates a new spill hoisting method in postOptimization of register allocation. It does spill hoisting at once after all the spills are generated instead of inside every instance of selectOrSplit. The second part queries the define expr of the original register for rematerializaiton and keep it always available during register allocation even if it is already dead. It deletes those dead instructions only in postOptimization. With the two parts in the patch, it can remove analyzeSiblingValues without sacrificing performance. Patches on top of r265547: r265610 "Fix the compare-clang diff error introduced by r265547." r265639 "Fix the sanitizer bootstrap error in r265547." r265657 "InlineSpiller.cpp: Escap \@ in r265547. [-Wdocumentation]" Differential Revision: http://reviews.llvm.org/D15302 Differential Revision: http://reviews.llvm.org/D18934 Differential Revision: http://reviews.llvm.org/D18935 Differential Revision: http://reviews.llvm.org/D18936 llvm-svn: 266162	2016-04-13 03:08:27 +00:00
Matt Arsenault	677106a8c7	AMDGPU: Add test for m0 initialization in basic loop Initialization of m0 is emitted for each LDS operation, so every block with LDS usage ends up with one. MachineLICM used to fail to hoist this out of the loop, so every loop iteration with LDS usage in it would re-initialize it. This seems to be fixed now, so add a test to make sure that it stays this way. llvm-svn: 266156	2016-04-13 00:39:52 +00:00
Justin Bogner	a980977c67	CodeGen: Clear the MFI's save and restore point after PrologEpilogInserter This state is no longer useful and not guaranteed to be valid in later codegen passes. For example, see the added test, which would print a savepoint of %bb.-1 without this change, and crashes with a use-after-free error under ASan if you apply the recycling allocator patch from llvm.org/PR26808. llvm-svn: 266150	2016-04-12 23:21:53 +00:00
Nicolai Haehnle	30a743add7	AMDGPU: add llvm.amdgcn.buffer.load/store intrinsics Summary: They correspond to BUFFER_LOAD/STORE_DWORD[_X2,X3,X4] and mostly behave like llvm.amdgcn.buffer.load/store.format. They will be used by Mesa for SSBO and atomic counters at least when robust buffer access behavior is desired. (These instructions perform no format conversion and do buffer range checking per component.) As a side effect of sharing patterns with llvm.amdgcn.buffer.store.format, it has become trivial to add support for the f32 and v2f32 variants of that intrinsic, so the patch does so. Also DAG-ify (and fix) some tests that I noticed intermittent failures in while developing this patch. Some tests were (temporarily) adjusted for the required mayLoad/hasSideEffects changes to the BUFFER_STORE_DWORD* instructions. See also http://reviews.llvm.org/D18291. Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18292 llvm-svn: 266126	2016-04-12 21:18:10 +00:00
Derek Schuff	887911441a	[WebAssembly] Fix debug info in reg-stackify.ll test It lacked a CU and thus became invalid with r266102 llvm-svn: 266114	2016-04-12 20:12:05 +00:00
Tom Stellard	ec17b58f39	AMDGPU/SI: Insert wait states required after v_readfirstlane on SI Summary: We will be able to handle this case much better once the hazard recognizer is finished, but this conservative implementation fixes a hang with the piglit test: spec/arb_arrays_of_arrays/execution/sampler/fs-nested-struct-arrays-nonconst-nested-arra Reviewers: arsenm, nhaehnle Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18988 llvm-svn: 266105	2016-04-12 18:40:43 +00:00
Matt Arsenault	0740a1bead	AMDGPU: Eliminate half of i64 or if one operand is zero_extend from i32 This helps clean up some of the mess when expanding unaligned 64-bit loads when changed to be promote to v2i32, and fixes situations where or x, 0 was emitted after splitting 64-bit ors during moveToVALU. I think this could be a generic combine but I'm not sure. llvm-svn: 266104	2016-04-12 18:24:38 +00:00
Nicolai Haehnle	be6e455946	AMDGPU/SI: Fix a mis-compilation of multi-level breaks Summary: Under certain circumstances, multi-level breaks (or what is understood by the control flow passes as such) could be miscompiled in a way that causes infinite loops, by emitting incorrect control flow intrinsics. This fixes a hang in dEQP-GLES3.functional.shaders.loops.while_dynamic_iterations.conditional_continue_vertex Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18967 llvm-svn: 266088	2016-04-12 16:10:38 +00:00
Artur Pilipenko	d9a8d9a1af	Support arbitrary addrspace pointers in masked load/store intrinsics This is a resubmittion of 263158 change. This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace. The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics. Reviewed By: reames Differential Revision: http://reviews.llvm.org/D17270 llvm-svn: 266086	2016-04-12 15:58:04 +00:00
Geoff Berry	6cb5fc80e8	[ScheduleDAGInstrs] Handle instructions with multiple MMOs Summary: In getUnderlyingObjectsForInstr(): Don't give up on instructions with multiple MMOs, instead look through all the MMOs and if they all meet the conservative criteria previously used for single MMO instructions, then return all of the underlying objects derived from the MMOs. The change to ScheduleDAGInstrs::buildSchedGraph() is needed to avoid the case where multiple underlying objects are present and are related in such a way that successive iterations of the loop end up adding a dependency from an instruction to itself. Reviewers: atrick, hfinkel Subscribers: MatzeB, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D18093 llvm-svn: 266084	2016-04-12 15:50:19 +00:00
Than McIntosh	b927171cd9	Test commit, NFC. Adds a blank line. llvm-svn: 266082	2016-04-12 15:35:05 +00:00
Matt Arsenault	007a459aa3	AMDGPU: Implement i64 global atomics llvm-svn: 266075	2016-04-12 14:05:11 +00:00
Matt Arsenault	f100af354c	AMDGPU: Add atomic_inc + atomic_dec intrinsics These are different than atomicrmw add 1 because they have an additional input value to clamp the result. llvm-svn: 266074	2016-04-12 14:05:04 +00:00
Matt Arsenault	7193bf43c5	AMDGPU: Add volatile to test loads and stores When the memory vectorizer is enabled, these tests break. These tests don't really care about the memory instructions, and it's easier to write check lines with the unmerged loads. llvm-svn: 266071	2016-04-12 13:38:18 +00:00
Simon Pilgrim	c6acd56fdb	[X86] Regenerated avx512 calling convention test checks llvm-svn: 266070	2016-04-12 13:31:01 +00:00
Simon Dardis	0af53c5560	Revert "[mips] MIPSR6 Compact branch aliases" This reverts commit r266055. ps4-buildslave2 is highlighting a failure. llvm-svn: 266061	2016-04-12 12:22:45 +00:00
Simon Dardis	d847d985b9	[mips] MIPSR6 Compact branch aliases Summary: Alias 'jic $reg, 0' to 'jrc $reg' and 'jialc $reg, 0' to 'jalrc $reg' like binutils. Reviewers: dsanders Differential Revision: http://reviews.llvm.org/D18856 llvm-svn: 266055	2016-04-12 10:41:53 +00:00
Chuang-Yu Cheng	19e3905d51	[PPC64] Mark CR0 Live if PPCInstrInfo::optimizeCompareInstr Creates a Use of CR0 Resolve Bug 27046 (https://llvm.org/bugs/show_bug.cgi?id=27046). The PPCInstrInfo::optimizeCompareInstr function could create a new use of CR0, even if CR0 were previously dead. This patch marks CR0 live if a use of CR0 is created. Author: Tom Jablin (tjablin) Reviewers: hfinkel kbarton cycheng http://reviews.llvm.org/D18884 llvm-svn: 266040	2016-04-12 03:10:52 +00:00
Chuang-Yu Cheng	83dddda4a1	[PPC64] Use mfocrf in prologue when we only need to save 1 nonvolatile CR field In the ELFv2 ABI, we are not required to save all CR fields. If only one nonvolatile CR field is clobbered, use mfocrf instead of mfcr to selectively save the field, because mfocrf has short latency compares to mfcr. Thanks Nemanja's invaluable hint! Reviewers: nemanjai tjablin hfinkel kbarton http://reviews.llvm.org/D17749 llvm-svn: 266038	2016-04-12 03:04:44 +00:00
Quentin Colombet	c4bc2f5fb2	[AArch64] Add test cases for the repairing of physical registers. llvm-svn: 266030	2016-04-12 00:43:40 +00:00
Quentin Colombet	049dfdbb3c	[AArch64] Add a test case for the propagation of register banks through phis. llvm-svn: 266028	2016-04-12 00:32:55 +00:00
Quentin Colombet	5b91aa8e53	[AArch64] Add a test case for the repairing of definitions. llvm-svn: 266026	2016-04-12 00:25:22 +00:00
Quentin Colombet	df50861994	[AArch64] Test that RegBankSelect inserts the proper copies to fix the register bank assignments. llvm-svn: 266021	2016-04-12 00:00:42 +00:00
Evgeniy Stepanov	b10de1bbaa	[safestack] Add canary to unsafe stack frames Add StackProtector to SafeStack. This adds limited protection against data corruption in the caller frame. Current implementation treats all stack protector levels as -fstack-protector-all. llvm-svn: 266004	2016-04-11 22:27:48 +00:00
Tim Northover	83aa2384f4	ARM: use r7 as the frame-pointer on all MachO targets. This is better for a few reasons: + It matches the other tooling for iOS. + It matches EABI in more cases (i.e. Thumb-mode, and in practice we don't use ARM mode). + It leads to infinitesimally smaller code (0.2%, yay!). rdar://25369506 llvm-svn: 266003	2016-04-11 22:27:40 +00:00
Manman Ren	41e16ffe96	swifterror: fix up a testing case. llvm-svn: 266000	2016-04-11 21:45:33 +00:00
Simon Pilgrim	02dd72b456	[DAGCombiner] Fold xor/and/or (bitcast(A), bitcast(B)) -> bitcast(op (A,B)) anytime before LegalizeVectorOprs xor/and/or (bitcast(A), bitcast(B)) -> bitcast(op (A,B)) was only being combined at the AfterLegalizeTypes stage, this patch permits the combine to occur anytime before then as well. The main aim with this to improve the ability to recognise bitmasks that can be converted to shuffles. I had to modify a number of AVX512 mask tests as the basic bitcast to/from scalar pattern was being stripped out, preventing testing of the mmask bitops. By replacing the bitcasts with loads we can get almost the same result. Differential Revision: http://reviews.llvm.org/D18944 llvm-svn: 265998	2016-04-11 21:10:33 +00:00
Manman Ren	e6636caf43	Swift Calling Convention: swifterror target support. Differential Revision: http://reviews.llvm.org/D18716 llvm-svn: 265997	2016-04-11 21:08:06 +00:00
Tom Stellard	2898e45c97	Revert "AMDGPU/SI: Do not generate s_waitcnt after ds_permute/ds_bpermute" This reverts commit r263720. Just confirmed that s_waitcnt is required after ds_permute/ds_bpermute. llvm-svn: 265992	2016-04-11 20:38:40 +00:00
Adrian Prantl	dac3cdd613	More upgrading of old- and very-old-style debug info in testcases. llvm-svn: 265953	2016-04-11 15:53:44 +00:00
Petar Jovanovic	a8eb1f20c0	[mips] Make Static a default relocation model for MIPS codegen This change follows up defaults for GCC and Clang, so LLVM does not differ from them. While number of the test files are touched with this change, they all keep the old (expected) behaviour with the explicit option: "-relocation-model=pic" The tests that have not been touched are insensitive to relocation model. Differential Revision: http://reviews.llvm.org/D17995 llvm-svn: 265949	2016-04-11 15:24:23 +00:00
Ulrich Weigand	8612094f9a	[SystemZ] Support conditional indirect sibling calls via BCR This adds a conditional variant of CallBR instruction, CallBCR. Also, it can be fused with integer comparisons, resulting in one of the new C*BCall instructions. In addition to CallBRCL limitations, this has another one: it won't trigger if the function to call isn't already in %r1 - see f22 in the test for an example (it's also why the loads in tests are volatile). Author: koriakin Differential Revision: http://reviews.llvm.org/D18928 llvm-svn: 265933	2016-04-11 12:12:32 +00:00
Simon Pilgrim	543762a4a3	[X86] Added extra widening tests for and/xor/or bit operations Add tests for bitcasting an illegal vector to/from a legal scalar Additional tests requested for D18944 llvm-svn: 265930	2016-04-11 11:10:36 +00:00
Simon Pilgrim	9a8e91143a	[X86] Added extra widening tests for and/xor/or bit operations To make sure we're dealing with both cases of legal/illegal number of vector elements and legal/illegal vector element types llvm-svn: 265929	2016-04-11 10:58:52 +00:00
Simon Pilgrim	6a1e2f4957	[X86] Regenerated sdglue test checks llvm-svn: 265927	2016-04-11 10:22:05 +00:00
Simon Pilgrim	bfed1985be	[X86] Added widening tests for and/xor/or bit operations Part of additional tests requested for D18944 llvm-svn: 265925	2016-04-11 10:16:27 +00:00
Simon Pilgrim	d0a0ec976f	[X86][AVX512] Add vector integer division by constant tests Added sdiv/srem and udiv/urem tests cases for 512 bit vectors. llvm-svn: 265903	2016-04-10 17:14:26 +00:00
Simon Pilgrim	3b0d269398	[X86][AVX512BW] Add support for v64i8 multiplies Extend the existing lowering of vXi8 multiplies to support v64i8 on avx512bw targets. I added the Lower512IntArith helper function to help with this - not sure how often this could be used in the future, but it seemed better than putting all that logic inside LowerMUL. Differential Revision: http://reviews.llvm.org/D18937 llvm-svn: 265902	2016-04-10 17:02:48 +00:00
Simon Pilgrim	7b63426d6a	[X86][AVX512] Regenerated mask op tests llvm-svn: 265898	2016-04-10 14:16:03 +00:00
Charles Davis	9f6526358e	[CodeGen] Don't assume that fixed stack objects are aligned in a stack-realigned function. Summary: After we make the adjustment, we can assume that for local allocas, but not for stack parameters, the return address, or any other fixed stack object (which has a negative offset and therefore lies prior to the adjusted SP). Fixes PR26662. Reviewers: hfinkel, qcolombet, rnk Subscribers: rnk, llvm-commits Differential Revision: http://reviews.llvm.org/D18471 llvm-svn: 265886	2016-04-09 23:34:42 +00:00
Sanjay Patel	ea5cc7e72d	[x86] use BMI 'andn' for logic + compare ops With BMI, we can use 'andn' to save an instruction when the result is only used in a compare. This is related to one of the potential sequences to check 'isfinite' in: https://llvm.org/bugs/show_bug.cgi?id=27164 Differential Revision: http://reviews.llvm.org/D18910 llvm-svn: 265875	2016-04-09 16:02:52 +00:00
Simon Pilgrim	c8f7b4ec60	[X86][XOP] Support for VPPERM 2-input shuffle mask decoding This patch adds support for decoding XOP VPPERM instruction when it represents a basic shuffle. The mask decoding required the existing MCInstrLowering code to be updated to support binary shuffles - the implementation now matches what is done in X86InstrComments.cpp. Differential Revision: http://reviews.llvm.org/D18441 llvm-svn: 265874	2016-04-09 14:51:26 +00:00
Sanjay Patel	9fcd4fe510	[x86] show missed opportunities to use andn llvm-svn: 265850	2016-04-08 21:26:11 +00:00
Sanjay Patel	a10f6f5022	[x86] regenerate checks for BMI tests llvm-svn: 265841	2016-04-08 20:29:39 +00:00
James Y Knight	dfd040c430	Add trailing colons to labels in a test. This will avoid matching on the FILENAME if it happened to contain, say, "f4" anywhere in the file path. llvm-svn: 265837	2016-04-08 19:49:03 +00:00
Nirav Dave	7b6e012d2f	Fix Load Control Dependence in MemCpy Generation In Memcpy lowering we had missed a dependence from the load of the operation to successor operations. This causes us to potentially construct an in initial DAG with a memory dependence not fully represented in the chain sub-DAG but rather require looking at the entire DAG breaking alias analysis by allowing incorrect repositioning of memory operations. To work around this, r200033 changed DAGCombiner::GatherAllAliases to be conservative if any possible issues to happen. Unfortunately this check forbade many non-problematic situations as well. For example, it's common for incoming argument lowering to add a non-aliasing load hanging off of EntryNode. Then, if GatherAllAliases visited EntryNode, it would find that other (unvisited) use of the EntryNode chain, and just give up entirely. Furthermore, the check was incomplete: it would not actually detect all such potentially problematic DAG constructions, because GatherAllAliases did not guarantee to visit all chain nodes going up to the root EntryNode. This is in general fine -- giving up early will just miss a potential optimization, not generate incorrect results. But, for this non-chain dependency detection code, it's possible that you could have a load attached to a higher-up chain node than any which were visited. If that load aliases your store, but the only dependency is through the value operand of a non-aliasing store, it would've been missed by this code, and potentially reordered. With the dependence added, this check can be removed and Alias Analysis can be much more aggressive. This fixes code quality regression in the Consecutive Store Merge cleanup (D14834). Test Change: ppc64-align-long-double.ll now may see multiple serializations of its stores Differential Revision: http://reviews.llvm.org/D18062 llvm-svn: 265836	2016-04-08 19:44:40 +00:00
Kevin B. Smith	5defc888fe	[X86] Fix PR23155 by turning on X86FixupBWInsts by default. Differential Revision: http://reviews.llvm.org/D18866 llvm-svn: 265830	2016-04-08 18:58:29 +00:00
Colin LeMahieu	ed62b97b18	Revert r265817 lld tests need to be addressed. llvm-svn: 265822	2016-04-08 18:15:37 +00:00
Colin LeMahieu	eaba356a61	[llvm-objdump] Printing hex instead of dec by default Differential Revision: http://reviews.llvm.org/D18770 llvm-svn: 265817	2016-04-08 17:55:03 +00:00
Ulrich Weigand	7102a6833f	[SystemZ] Support conditional sibling calls via BRCL This adds a conditional variant of CallJG instruction, CallBRCL. It can be used for conditional sibling calls. Unfortunately, due to IfCvt limitations, it only really works well for functions without arguments. Author: koriakin Differential Revision: http://reviews.llvm.org/D18864 llvm-svn: 265814	2016-04-08 17:22:19 +00:00
Quentin Colombet	9795e49e82	[AArch64] Add a test case for the default mapping of RegBankSelect. llvm-svn: 265811	2016-04-08 17:11:51 +00:00
Quentin Colombet	6fa8f3c563	[MIR] Teach the parser how to deal with register banks. llvm-svn: 265802	2016-04-08 16:40:43 +00:00
Sam Parker	e40fc81f76	[ARM] Enable SMLAW[B\|T] and SMLUW[B\|T] instruction selection Added ISelDAGToDAG functions to enable selection of the smlawb, smlawt, smulwb and smulwt instructions for the ARM backend. Also updated the smul CodeGen test and removed the smulw one. Differential Revision: http://reviews.llvm.org/D18892 llvm-svn: 265793	2016-04-08 16:02:53 +00:00
Hans Wennborg	8c75bb0137	Revert r265547 "Recommit r265309 after fixed an invalid memory reference bug happened" It caused PR27275: "ARM: Bad machine code: Using an undefined physical register" Also reverting the following commits that were landed on top: r265610 "Fix the compare-clang diff error introduced by r265547." r265639 "Fix the sanitizer bootstrap error in r265547." r265657 "InlineSpiller.cpp: Escap \@ in r265547. [-Wdocumentation]" llvm-svn: 265790	2016-04-08 15:17:43 +00:00
Simon Pilgrim	9cffc0286e	[X86][SSE] Added 32-bit tests for vector lzcnt/tzcnt tests v2i64 tests are particularly bad on 32-bit targets. llvm-svn: 265789	2016-04-08 15:01:31 +00:00
Chuang-Yu Cheng	6e4b4f696f	CXX_FAST_TLS calling convention: performance improvement for PPC64 This is the same change on PPC64 as r255821 on AArch64. I have even borrowed his commit message. The access function has a short entry and a short exit, the initialization block is only run the first time. To improve the performance, we want to have a short frame at the entry and exit. We explicitly handle most of the CSRs via copies. Only the CSRs that are not handled via copies will be in CSR_SaveList. Frame lowering and prologue/epilogue insertion will generate a short frame in the entry and exit according to CSR_SaveList. The majority of the CSRs will be handled by register allcoator. Register allocator will try to spill and reload them in the initialization block. We add CSRsViaCopy, it will be explicitly handled during lowering. 1> we first set FunctionLoweringInfo->SplitCSR if conditions are met (the target supports it for the given machine function and the function has only return exits). We also call TLI->initializeSplitCSR to perform initialization. 2> we call TLI->insertCopiesSplitCSR to insert copies from CSRsViaCopy to virtual registers at beginning of the entry block and copies from virtual registers to CSRsViaCopy at beginning of the exit blocks. 3> we also need to make sure the explicit copies will not be eliminated. Author: Tom Jablin (tjablin) Reviewers: hfinkel kbarton cycheng http://reviews.llvm.org/D17533 llvm-svn: 265781	2016-04-08 12:04:32 +00:00
Zlatko Buljan	a49b48852d	[mips][microMIPS] Add CodeGen support for ADD, ADDIU, ADDU and DADD* instructions Differential Revision: http://reviews.llvm.org/D16454 llvm-svn: 265772	2016-04-08 07:27:26 +00:00
Dmitry Polukhin	a028bd777b	[IFUNC] Fix ifunc-asm.ll test It seems that llc cannot be called used in assembler tests so test that checks asm for particular target needs to be moved to codegen. llvm-svn: 265770	2016-04-08 06:45:19 +00:00
Amaury Sechet	fe775804df	Do not select EhPad BB in MachineBlockPlacement when there is regular BB to schedule Summary: EHPad BB are not entered the classic way and therefor do not need to be placed after their predecessors. This patch make sure EHPad BB are not chosen amongst successors to form chains, and are selected as last resort when selecting the best candidate. EHPad are scheduled in reverse probability order in order to have them flow into each others naturally. Reviewers: chandlerc, majnemer, rafael, MatzeB, escha, silvas Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D17625 llvm-svn: 265726	2016-04-07 21:29:39 +00:00
Jan Vesely	e71f70898a	AMDGPU/SI: Implement atomic load/store for i32 and i64 Standard load/store instructions with GLC bit set. Reviewers: tstellardAMD, arsenm Differential Revision: http://reviews.llvm.org/D18760 llvm-svn: 265709	2016-04-07 19:23:11 +00:00
Tom Stellard	4a205248ea	AMDGPU/SI: Add latency for export instructions Reviewers: arsenm, nhaehnle Subscribers: nhaehnle, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18599 llvm-svn: 265708	2016-04-07 18:30:05 +00:00
Simon Pilgrim	f643a7348a	[X86][SSE] Added bitmask pattern shuffle tests Based on OR(AND(MASK,V0),AND(~MASK,V1)) style patterns llvm-svn: 265697	2016-04-07 17:23:55 +00:00
Kevin B. Smith	b1d0008007	[X86]: Fix for PR27251. Differential Revision: http://reviews.llvm.org/D18850 llvm-svn: 265690	2016-04-07 16:15:34 +00:00
Ulrich Weigand	80d5f68422	[SystemZ] Implement conditional returns Return is now considered a predicable instruction, and is converted to a newly-added CondReturn (which maps to BCR to %r14) instruction by the if conversion pass. Also, fused compare-and-branch transform knows about conditional returns, emitting the proper fused instructions for them. This transform triggers on a lot of tests, hence the huge diffstat. The changes are mostly jX to br %r14 -> bXr %r14. Author: koriakin Differential Revision: http://reviews.llvm.org/D17339 llvm-svn: 265689	2016-04-07 16:11:44 +00:00
Ehsan Amiri	36d2ad6539	[PPC] Enable transformations in PPCPassConfig::addIRPasses at O2 http://reviews.llvm.org/D18562 A large number of testcases has been modified so they pass after this test. One testcase is deleted, because I realized even after undoing the original change that was committed with this testcase, the testcase still passes. So I removed it. The change to one other testcase (test/CodeGen/PowerPC/pr25802.ll) is an arbitrary change to keep it passing. Given the original intention of the testcase, and the fact that fixing it will require some time to change the testcase, we concluded that this quick change will be enough. llvm-svn: 265683	2016-04-07 15:30:55 +00:00
Simon Pilgrim	2e3347a5a5	[X86][SSE] Add support for VZEXT constant folding llvm-svn: 265646	2016-04-07 07:52:45 +00:00
Ahmed Bougacha	d78960d142	[X86] Reuse EFLAGS and form LOCKed ops when only user is SETCC. Re-apply r265450 which caused PR27245 and was reverted in r265559 because of a wrong generalization: the fetch_and_add->add_and_fetch combine only works in specific, but pretty common, cases: (icmp slt x, 0) -> (icmp sle (add x, 1), 0) (icmp sge x, 0) -> (icmp sgt (add x, 1), 0) (icmp sle x, 0) -> (icmp slt (sub x, 1), 0) (icmp sgt x, 0) -> (icmp sge (sub x, 1), 0) Original Message: We only generate LOCKed versions of add/sub when the result is unused. It often happens that the result is used, but only by a comparison. We can optimize those out by reusing EFLAGS, which lets us use the proper instructions, instead of having to fallback to LXADD. Instead of doing this as an MI peephole (as we do for the other non-LOCKed (really, non-MR) forms), do it in ISel. It becomes quite tricky later. This also makes it eventually possible to stop expanding and/or/xor if the only user is an icmp (also see D18141). This uses the LOCK ISD opcodes added by r262244. Differential Revision: http://reviews.llvm.org/D17633 llvm-svn: 265636	2016-04-07 02:07:10 +00:00
Ahmed Bougacha	f5a4ddf62c	[X86] Refresh and tweak EFLAGS reuse tests. NFC. The non-1 and EQ/NE tests were misguided. llvm-svn: 265635	2016-04-07 02:06:53 +00:00
Hans Wennborg	3add36fb90	Re-commit r265039 "[X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr (PR27140)" Third time's the charm? The previous attempt (r265345) caused ASan test failures on X86, as broken CFI caused stack traces to not work. This version of the patch makes sure not to merge with stack adjustments that have CFI, and to not add merged instructions' offests to the CFI about to be generated. This is already covered by the lit tests; I just got the expectations wrong previously. llvm-svn: 265623	2016-04-07 00:05:49 +00:00
Ehsan Amiri	ee69c81e47	[PPC] Use VSX/FP Facility integer load when an integer load's only users are conversion to FP http://reviews.llvm.org/D18405 When the integer value loaded is never used directly as integer we should use VSX or Floating Point Facility integer loads and avoid extra direct move llvm-svn: 265593	2016-04-06 20:12:29 +00:00
Nicolai Haehnle	69b2d0adeb	AMDGPU: Add a shader calling convention This makes it possible to distinguish between mesa shaders and other kernels even in the presence of compute shaders. Patch By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Differential Revision: http://reviews.llvm.org/D18559 llvm-svn: 265589	2016-04-06 19:40:20 +00:00
Hans Wennborg	f357d442d6	Revert r265450 "[X86] Reuse EFLAGS and form LOCKed ops when only user is SETCC." It caused ASan 32-bit tests to hang (PR27245). llvm-svn: 265559	2016-04-06 16:44:38 +00:00
Hans Wennborg	657e738668	Revert "Re-commit r265039 "[X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr (PR27140)"" It seems to be causing ASan tests to crash, probably due to miscompiling the run-time somehow. llvm-svn: 265551	2016-04-06 16:10:20 +00:00
Wei Mi	ec02e9ab60	Recommit r265309 after fixed an invalid memory reference bug happened when DenseMap growed and moved memory. I verified it fixed the bootstrap problem on x86_64-linux-gnu but I cannot verify whether it fixes the bootstrap error on clang-ppc64be-linux. I will watch the build-bot result closely. Replace analyzeSiblingValues with new algorithm to fix its compile time issue. The patch is to solve PR17409 and its duplicates. analyzeSiblingValues is a N x N complexity algorithm where N is the number of siblings generated by reg splitting. Although it causes siginificant compile time issue when N is large, it is also important for performance since it removes redundent spills and enables rematerialization. To solve the compile time issue, the patch removes analyzeSiblingValues and replaces it with lower cost alternatives containing two parts. The first part creates a new spill hoisting method in postOptimization of register allocation. It does spill hoisting at once after all the spills are generated instead of inside every instance of selectOrSplit. The second part queries the define expr of the original register for rematerializaiton and keep it always available during register allocation even if it is already dead. It deletes those dead instructions only in postOptimization. With the two parts in the patch, it can remove analyzeSiblingValues without sacrificing performance. Differential Revision: http://reviews.llvm.org/D15302 llvm-svn: 265547	2016-04-06 15:41:07 +00:00
Chuang-Yu Cheng	a5a77bb95c	[ppc64] Temporary disable sibling call optimization on ppc64 due to breaking test case r265506 breaks print-stack-trace.cc test case of compiler-rt in bootstrap test. http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/1708 llvm-svn: 265528	2016-04-06 10:48:36 +00:00
Chuang-Yu Cheng	de5217e247	[ppc64] Enable sibling call optimization on ppc64 ELFv1/ELFv2 abi This patch enable sibling call optimization on ppc64 ELFv1/ELFv2 abi, and add a couple of test cases. This patch also passed llvm/clang bootstrap test, and spec2006 build/run/result validation. Original issue: https://llvm.org/bugs/show_bug.cgi?id=25617 Great thanks to Tom's (tjablin) help, he contributed a lot to this patch. Thanks Hal and Kit's invaluable opinions! Reviewers: hfinkel kbarton http://reviews.llvm.org/D16315 llvm-svn: 265506	2016-04-06 02:04:38 +00:00
Sanjoy Das	30e0638513	Lower @llvm.experimental.deoptimize as a noreturn call While preserving the return value for @llvm.experimental.deoptimize at the IR level is useful during mid-level optimization, doing so at the machine instruction level requires generating some extra code and a return that is non-ideal. This change has LLVM lower ``` %val = call @llvm.experimental.deoptimize ret %val ``` to effectively ``` call @__llvm_deoptimize() unreachable ``` instead. llvm-svn: 265502	2016-04-06 01:33:49 +00:00
Davide Italiano	7f421e29c0	[DebugInfo] Fix tests so that each subprogram belongs to a CU. llvm-svn: 265490	2016-04-05 23:37:08 +00:00
Manman Ren	6e8ec540e6	Swift Calling Convention: swiftcc for ARM. Differential Revision: http://reviews.llvm.org/D18769 llvm-svn: 265482	2016-04-05 22:44:44 +00:00
Evgeniy Stepanov	c80f414fe8	Faster stack-protector for Android/AArch64. Bionic has a defined thread-local location for the stack protector cookie. Emit a direct load instead of going through __stack_chk_guard. llvm-svn: 265481	2016-04-05 22:41:50 +00:00
Manman Ren	d8f96bea63	Swift Calling Convention: add swiftcc. Differential Revision: http://reviews.llvm.org/D17863 llvm-svn: 265480	2016-04-05 22:41:47 +00:00
Ahmed Bougacha	6921e4df62	[X86] Reuse EFLAGS and form LOCKed ops when only user is SETCC. We only generate LOCKed versions of add/sub when the result is unused. It often happens that the result is used, but only by a comparison. We can optimize those out by reusing EFLAGS, which lets us use the proper instructions, instead of having to fallback to LXADD. Instead of doing this as an MI peephole (as we do for the other non-LOCKed (really, non-MR) forms), do it in ISel. It becomes quite tricky later. This also makes it eventually possible to stop expanding and/or/xor if the only user is an icmp (also see D18141). This uses the LOCK ISD opcodes added by r262244. Differential Revision: http://reviews.llvm.org/D17633 llvm-svn: 265450	2016-04-05 20:02:57 +00:00
Ahmed Bougacha	5fe8ea46ef	[X86] Add tests for ATOMIC_LOAD_OP EFLAGS reuse. NFC. llvm-svn: 265448	2016-04-05 20:02:44 +00:00
Quentin Colombet	a1f70e5494	[AArch64][Test] Do not override the suffixes for test cases. llvm-svn: 265441	2016-04-05 19:26:42 +00:00
Sanjay Patel	7cd7867725	[x86] regenerate checks utils/update_test_checks.py was improved with: http://reviews.llvm.org/rL265414 to include the first line of the function (expected to be a comment line). This ensures that nothing bad has happened before the first actual line of checked asm. It also matches the existing behavior of the old script. llvm-svn: 265416	2016-04-05 17:12:19 +00:00
JF Bastien	07072c3354	WebAssembly: fix cfg-stackify test It was broken by reshuffling induced by r265397 'Don't delete empty preheaders in CodeGenPrepare if it would create a critical edge'. llvm-svn: 265415	2016-04-05 17:01:52 +00:00
Jacques Pienaar	6e9736dc21	[lanai] LanaiSetflagAluCombiner more conservative Summary: LanaiSetflagAluCombiner could previously combine instructions across basic building blocks even when not legal. Make the LanaiSetflagAluCombiner more conservative to avoid this. Reviewers: eliben Subscribers: joker.eph, llvm-commits Differential Revision: http://reviews.llvm.org/D18746 llvm-svn: 265411	2016-04-05 16:18:13 +00:00
Konstantin Zhuravlyov	57ddfb8bca	[AMDGPU] Emit linkonce and linkonce_odr symbols Differential Revision: http://reviews.llvm.org/D18726 llvm-svn: 265408	2016-04-05 16:00:58 +00:00
Chuang-Yu Cheng	27fda4cfaf	Add missing test for the "Don't delete empty preheaders" added in r265397 Author: Tom Jablin (tjablin) llvm-svn: 265402	2016-04-05 14:21:32 +00:00
Chuang-Yu Cheng	13c5bb4a2e	Don't delete empty preheaders in CodeGenPrepare if it would create a critical edge Presently, CodeGenPrepare deletes all nearly empty (only phi and branch) basic blocks. This pass can delete loop preheaders which frequently creates critical edges. A preheader can be a convenient place to spill registers to the stack. If the entrance to a loop body is a critical edge, then spills may occur in the loop body rather than immediately before it. This patch protects loop preheaders from deletion in CodeGenPrepare even if they are nearly empty. Since the patch alters the CFG, it affects a large number of test cases. In most cases, the changes are merely cosmetic (basic blocks have different names or instruction orders change slightly). I am somewhat concerned about the test/CodeGen/Mips/brdelayslot.ll test case. If the loop preheader is not deleted, then the MIPS backend does not take advantage of a branch delay slot. Consequently, I would like some close review by a MIPS expert. The patch also partially subsumes D16893 from George Burgess IV. George correctly notes that CodeGenPrepare does not actually preserve the dominator tree. I think the dominator tree was usually not valid when CodeGenPrepare ran, but I am using LoopInfo to mark preheaders, so the dominator tree is now always valid before CodeGenPrepare. Author: Tom Jablin (tjablin) Reviewers: hfinkel george.burgess.iv vkalintiris dsanders kbarton cycheng http://reviews.llvm.org/D16984 llvm-svn: 265397	2016-04-05 14:06:20 +00:00
Simon Dardis	edd0920371	[mips] MIPSR6 Compact jump support This patch adds support for compact jumps similiar to the previous compact branch support for MIPSR6. Unlike compact branches, compact jumps do not have a forbidden slot. As MipsInstrInfo::getEquivalentCompactForm can determine the correct expansion for jumps and branches for both microMIPS and MIPSR6, remove the unnecessary distinction in the delay slot filler. Reviewers: vkalintiris Subscribers: llvm-commits, dsanders llvm-svn: 265390	2016-04-05 12:50:29 +00:00
Justin Holewinski	234f7b3e41	[NVPTX] Handle ldg created from sign-/zero-extended load Reviewers: jingyue Subscribers: jholewinski Differential Revision: http://reviews.llvm.org/D18053 llvm-svn: 265389	2016-04-05 12:38:01 +00:00
Teresa Johnson	b4c7b9ac14	Don't fold double constant to an integer if dest type not integral Summary: I encountered this issue when constant folding during inlining tried to fold away a bitcast of a double to an x86_mmx, which is not an integral type. The test case exposes the same issue with a smaller code snippet during early CSE. Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D18528 llvm-svn: 265367	2016-04-04 23:50:46 +00:00
Matthias Braun	14367ed4c8	test: Always treat .mir files as tests even outside of CodeGen/MIR We missed a handful of .mir tests that existed outside the test/CodeGen/MIR directory. Also fix the three powerpc .mir tests that nobody noticed were broken. llvm-svn: 265350	2016-04-04 21:23:44 +00:00
Hans Wennborg	fa531d7979	Re-commit r265039 "[X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr (PR27140)" The original commit miscompiled things on 32-bit Windows, e.g. a Clang boostrap. It turns out that mergeSPUpdates() was a bit too generous in what it interpreted as a stack adjustment, causing the following code: addl $12, %esp leal -4(%ebp), %esp To be "optimized" into simply: addl $8, %esp This commit tightens up mergeSPUpdates() and includes a new test (test14 in movtopush.ll) for this situation. llvm-svn: 265345	2016-04-04 21:02:46 +00:00
Sean Silva	d7d001c930	Beef up some dllexport tests. Adds some dllexport tests to verify that: - Variables in bss are exported appropriately - Non-dllexport symbols aliased to dllexport symbols are not exported - Symbols declared as dllexport but are not defined are not exported We plan to enable dllimport/dllexport support for the PS4, and these additional tests are for points we noticed in our internal testing. Patch by Warren Ristow! Differential Revision: http://reviews.llvm.org/D18682 llvm-svn: 265333	2016-04-04 19:10:55 +00:00
Matthias Braun	9984790824	ARM, AArch64, X86: Check preserved registers for tail calls. We can only perform a tail call to a callee that preserves all the registers that the caller needs to preserve. This situation happens with calling conventions like preserver_mostcc or cxx_fast_tls. It was explicitely handled for fast_tls and failing for preserve_most. This patch generalizes the check to any calling convention. Related to rdar://24207743 Differential Revision: http://reviews.llvm.org/D18680 llvm-svn: 265329	2016-04-04 18:56:13 +00:00
Wei Mi	dc83ff4641	Revert r265309 and r265312 because they caused some errors I need to investigate. llvm-svn: 265317	2016-04-04 17:45:03 +00:00
Derek Schuff	e26993d076	Add MachineFunctionProperty checks for AllVRegsAllocated for target passes Summary: This adds the same checks that were added in r264593 to all target-specific passes that run after register allocation. Reviewers: qcolombet Subscribers: jyknight, dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D18525 llvm-svn: 265313	2016-04-04 17:09:25 +00:00
Wei Mi	acd9f8c813	Replace analyzeSiblingValues with new algorithm to fix its compile time issue. The patch is to solve PR17409 and its duplicates. analyzeSiblingValues is a N x N complexity algorithm where N is the number of siblings generated by reg splitting. Although it causes siginificant compile time issue when N is large, it is also important for performance since it removes redundent spills and enables rematerialization. To solve the compile time issue, the patch removes analyzeSiblingValues and replaces it with lower cost alternatives containing two parts. The first part creates a new spill hoisting method in postOptimization of register allocation. It does spill hoisting at once after all the spills are generated instead of inside every instance of selectOrSplit. The second part queries the define expr of the original register for rematerializaiton and keep it always available during register allocation even if it is already dead. It deletes those dead instructions only in postOptimization. With the two parts in the patch, it can remove analyzeSiblingValues without sacrificing performance. Differential Revision: http://reviews.llvm.org/D15302 llvm-svn: 265309	2016-04-04 16:42:40 +00:00
Ulrich Weigand	ef8c006d34	[SystemZ] Support ATOMIC_FENCE A cross-thread sequentially consistent fence should be lowered into z/Architecture's BCR serialization instruction, instead of causing a fatal error in the back-end. Author: bryanpkc Differential Revision: http://reviews.llvm.org/D18644 llvm-svn: 265292	2016-04-04 12:45:44 +00:00
Ulrich Weigand	c0457f68d4	[SystemZ] Support llvm.frameaddress/llvm.returnaddress intrinsics Enable the SystemZ back-end to lower FRAMEADDR and RETURNADDR, which previously would cause the back-end to crash. Currently, only a frame count of zero is supported. Author: bryanpkc Differential Revision: http://reviews.llvm.org/D18514 llvm-svn: 265291	2016-04-04 12:44:55 +00:00
Elena Demikhovsky	c8ec20583d	AVX-512: Truncating store for i1 vectors Implemented truncstore for KNL and skylake-avx512. Covered vectors from v2i1 to v64i1. We save the value in bits (not in bytes) - v32i1 is saved in 4 bytes. Differential Revision: http://reviews.llvm.org/D18740 llvm-svn: 265283	2016-04-04 07:17:47 +00:00
Simon Pilgrim	652e457182	[X86][SSE] Refreshed MOVMSK sign bit tests llvm-svn: 265267	2016-04-03 18:59:42 +00:00
Elena Demikhovsky	b1c5cedd43	AVX-512: Load and Extended Load for i1 vectors Implemented load+{sign\|zero}_extend for i1 vectors Fixed failures in i1 vector load. Covered loading of v2i1, v4i1, v8i1, v16i1, v32i1, v64i1 vectors for KNL and SKX. Differential Revision: http://reviews.llvm.org/D18737 llvm-svn: 265259	2016-04-03 08:41:12 +00:00
Zoran Jovanovic	9cae055d65	[mips][microMIPS] Revert commits r264245 and r264248. Commit r264245 was the reason for failing tests in LLVM test suite. Commit r264248 depends on the first one. llvm-svn: 265249	2016-04-02 23:06:13 +00:00

... 3 4 5 6 7 ...

15839 Commits