llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-21 12:02:58 +02:00

Author	SHA1	Message	Date
Simon Pilgrim	93256d06c8	Revert r268504 llvm-svn: 268526	2016-05-04 17:49:14 +00:00
Nemanja Ivanovic	d6a1215edc	[PowerPC] Generate VSX version of splat word This patch corresponds to review: http://reviews.llvm.org/D18592 It allows the PPC back end to generate the xxspltw instruction where we previously only emitted vspltw. llvm-svn: 268516	2016-05-04 16:04:02 +00:00
Simon Pilgrim	65259dd1a8	[X86][SSE] Regenerate vector bswap tests llvm-svn: 268514	2016-05-04 15:45:48 +00:00
Simon Pilgrim	ecd973d988	[SelectionDAG] BITREVERSE vector legalization of bit operations Vector bit operations are typically promoted instead of having custom lowering. This patch changes the isOperationLegalOrCustom tests for vector AND/OR operations to use isOperationLegalOrPromote instead, allowing the SSE implementations to stay on the simd unit. Differential Revision: http://reviews.llvm.org/D19805 llvm-svn: 268504	2016-05-04 15:01:13 +00:00
Elena Demikhovsky	b494e86dcd	The test files are auto-generated by update_llc_test_checks.py utility. No functional changes. llvm-svn: 268498	2016-05-04 14:31:18 +00:00
Chris Dewhurst	a2cf1867ba	[Sparc] Allow taking of function address into a register. Modification of previously existing code (variable rename only), with unit test added. Differential Revision: http://reviews.llvm.org/D19368 llvm-svn: 268493	2016-05-04 12:11:05 +00:00
Zlatko Buljan	3153f38fd5	[mips][microMIPS] Add CodeGen support for microMIPSr6 ROTR and ROTRV and add tests for LL, SC, SYSCALL, ROTR, ROTRV, LWM32, SWM32 and MOVEP instructions Differential Revision: http://reviews.llvm.org/D19857 llvm-svn: 268491	2016-05-04 12:02:12 +00:00
Chris Dewhurst	7a66203c71	[Sparc] Implement __builtin_setjmp, __builtin_longjmp back-end. This code implements builtin_setjmp and builtin_longjmp exception handling intrinsics for 32-bit Sparc back-ends. The code started as a mash-up of the PowerPC and X86 versions, although there are sufficient differences to both that had to be made for Sparc handling. Note: I have manual tests running. I'll work on a unit test and add that to the rest of this diff in the next day. Also, this implementation is only for 32-bit Sparc. I haven't focussed on a 64-bit version, although I have left the code in a prepared state for implementing this, including detecting pointer size and comments indicating where I suspect there may be differences. Differential Revision: http://reviews.llvm.org/D19798 llvm-svn: 268483	2016-05-04 09:33:30 +00:00
Daniel Sanders	c91fe0f456	[mips] Remove -mattr=+n64 and fix indentation in tailcall.ll RUN lines. NFC. -mattr=+n64 isn't the correct way to specify the ABI and N64 is already the default for the RUN line concerned. llvm-svn: 268482	2016-05-04 09:08:35 +00:00
David Majnemer	a2740cd234	[X86] Lower zext i1 arguments i1 is now a legal type for X86 with AVX512. There were some paths in X86FastISel which were not quite ready to see an i1 value: they were not quite sure how to deal with sign/zero extends for call arguments. DTRT by extending to i8 for zeroext and bailing out of FastISel for signext. This fixes PR27591. llvm-svn: 268470	2016-05-04 00:22:23 +00:00
Simon Pilgrim	195a8be3e5	[X86][XOP] Add placeholder VPERMIL2 combining tests llvm-svn: 268450	2016-05-03 21:55:37 +00:00
Tim Northover	06b0388bac	X86-Darwin: start emitting data-region directives for jump-tables. The surrounding tools can cope these days, and they were invented for a reason. llvm-svn: 268437	2016-05-03 21:03:41 +00:00
Quentin Colombet	5074b3866a	[ImplicitNullChecks] Account for implicit-defs as well when updating the liveness. The replaced load may have implicit-defs and those defs may be used in the block of the original load. Make sure to update the liveness accordingly. This is a generalization of r267817. llvm-svn: 268412	2016-05-03 18:09:06 +00:00
Simon Pilgrim	386b3c8531	[X86][SSE] Added target shuffle combine to MOVQ llvm-svn: 268391	2016-05-03 15:05:13 +00:00
Daniel Sanders	ec5e1f9b79	[mips][fastisel] ADJCALLSTACKUP has a second immediate operand. Summary: It's always zero for SelectionDAG and is never read by the MIPS backend so do the same for FastISel. Reviewers: sdardis Subscribers: dsanders, llvm-commits, sdardis Differential Revision: http://reviews.llvm.org/D19863 llvm-svn: 268386	2016-05-03 14:19:26 +00:00
Simon Pilgrim	d129c797cc	[X86][SSSE3] Missing combine opportunity to simplify to a MOVQ shuffle llvm-svn: 268378	2016-05-03 13:12:44 +00:00
Igor Breger	2e54ab6509	[AVX512] Add support for commutative MAX/MIN . In general VMAX{PS,PD} and VMIN{PS,PD} instruction are not commutative . In combine pass only if UnsafeFPMath are used VMAX/VMAX are converted to commutative nodes VMAXC/VMAXC. Differential Revision: http://reviews.llvm.org/D19860 llvm-svn: 268375	2016-05-03 11:51:45 +00:00
Igor Breger	ab90b9e166	[AVX512] Fix lowerV4X128VectorShuffle to select correctly input operands . Differential Revision: http://reviews.llvm.org/D19803 llvm-svn: 268368	2016-05-03 08:08:44 +00:00
Matthias Braun	cd14842dc0	Fix uppercase typo llvm-svn: 268362	2016-05-03 05:21:53 +00:00
Matthias Braun	4869e8d140	AArch64/optimizeCondBranch: Remove earlier kill flag when forming TBZ This fixes -verify-machineinstrs complaints when compiling test-suite/SingleSource/Benchmarks/Shootout-C++/wordfreq.cpp llvm-svn: 268360	2016-05-03 04:54:16 +00:00
Quentin Colombet	cf0d20f78c	[MachineBlockPlacement] Let the target optimize the branches at the end. After the layout of the basic blocks is set, the target may be able to get rid of unconditional branches to fallthrough blocks that the generic code does not catch. This happens any time TargetInstrInfo::AnalyzeBranch is not able to analyze all the branches involved in the terminators sequence, while still understanding a few of them. In such situation, AnalyzeBranch can directly modify the branches if it has been instructed to do so. This patch takes advantage of that. llvm-svn: 268328	2016-05-02 22:58:59 +00:00
Quentin Colombet	6b53c89899	[X86] Model FAULTING_LOAD_OP as a terminator and branch. This operation may branch to the handler block and we do not want it to happen anywhere within the basic block. Moreover, by marking it "terminator and branch" the machine verifier does not wrongly assume (because of AnalyzeBranch not knowing better) the branch is analyzable. Indeed, the target was seeing only the unconditional branch and not the faulting load op and thought it was a simple unconditional block. The machine verifier was complaining because of that and moreover, other optimizations could have done wrong transformation! In the process, simplify the representation of the handler block in the faulting load op. Now, we directly reference the handler block instead of using a label. This has the benefits of: 1. MC knows how to issue a label for a BB, so leave that to it. 2. Accessing the target BB from its label is painful, whereas it is direct from a MBB operand. Note: The 2 bytes offset in implicit-null-check.ll comes from the fact the unconditional jumps are not removed anymore, as the whole terminator sequence is not analyzable anymore. Will fix it in a subsequence commit. llvm-svn: 268327	2016-05-02 22:58:54 +00:00
Matt Arsenault	dfb613a88d	AMDGPU: Custom lower v2i32 loads and stores This will allow us to split up 64-bit private accesses when necessary. llvm-svn: 268296	2016-05-02 20:13:51 +00:00
Tom Stellard	d541008932	AMDGPU/SI: Use v_readfirstlane_b32 when restoring SGPRs spilled to scratch We were using v_readlane_b32 with the lane set to zero, but this won't work if thread 0 is not active. Differential Revision: http://reviews.llvm.org/D19745 llvm-svn: 268295	2016-05-02 20:11:44 +00:00
Matt Arsenault	7932e530a0	AMDGPU: Make i64 loads/stores promote to v2i32 Now that unaligned access expansion should not attempt to produce i64 accesses, we can remove the hack in PreprocessISelDAG where this is done. This allows splitting i64 private accesses while allowing the new add nodes indexing the vector components can be folded with the base pointer arithmetic. llvm-svn: 268293	2016-05-02 20:07:26 +00:00
Simon Pilgrim	6c3bbc1c10	[X86][AVX2] Added 128-bit wide shuffle test Demonstrate missing 128-bit wide shuffle combine support llvm-svn: 268290	2016-05-02 19:46:58 +00:00
Tim Northover	d33c3d2654	ARM: fix handling of SUB immediates in peephole opt. We were negating an immediate that was going to be used in a SUBri form unnecessarily. Since ADD/SUB are very similar we can do that, but we have to change the SUB to an ADD at the same time. This also applies to ADD, and allows us to handle a slightly larger range of immediates for those two operations. rdar://25992245 llvm-svn: 268276	2016-05-02 18:30:08 +00:00
Justin Holewinski	9cc32d1b1b	[NVPTX] Fix sign/zero-extending ldg/ldu instruction selection Summary: We don't have sign-/zero-extending ldg/ldu instructions defined, so we need to emulate them with explicit CVTs. We were originally handling the i8 case, but not any other cases. Fixes PR26185 Reviewers: jingyue, jlebar Subscribers: jholewinski Differential Revision: http://reviews.llvm.org/D19615 llvm-svn: 268272	2016-05-02 18:12:02 +00:00
Tom Stellard	179b86b996	AMDGPU/SI: Use the hazard recognizer to break SMEM soft clauses Summary: Add support for detecting hazards in SMEM soft clauses, so that we only break the clauses when necessary, either by adding s_nop or re-ordering other alu instructions. Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18870 llvm-svn: 268260	2016-05-02 17:39:06 +00:00
Derek Schuff	f0cc027b2e	[WebAssembly] Rename memory_size intrinsic to current_memory This follows the recent renaming in the wasm spec. llvm-svn: 268255	2016-05-02 17:25:22 +00:00
Tom Stellard	7f58d124e5	AMDGPU/SI: Use hazard recognizer to detect DPP hazards Reviewers: arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18603 llvm-svn: 268247	2016-05-02 16:23:09 +00:00
David L Kreitzer	5e9178eeb3	Enable the X86 call frame optimization for the 64-bit targets that allow it. Fixes PR27241. Differential Revision: http://reviews.llvm.org/D19688 llvm-svn: 268227	2016-05-02 13:45:25 +00:00
Jonas Paulsson	ee848f9766	[SystemZ] Temporarily disable codegen test int-add-12.ll. This checks for AGSI transformation, which is temporarily disabled. llvm-svn: 268219	2016-05-02 10:42:47 +00:00
Craig Topper	721f6428df	[AVX512] VPACKUSWB/VPACKSSWB should not be encoded with EVEX.W=1. While there fix the execution domain for VPACKSSDW/VPACKUSDW. llvm-svn: 268200	2016-05-01 17:38:32 +00:00
Igor Breger	fa752e801d	getelementptr instruction, support index vector of EVT. Differential Revision: http://reviews.llvm.org/D19775 llvm-svn: 268195	2016-05-01 13:29:12 +00:00
Igor Breger	a0208b4462	Change AVX512 braodcastsd/ss patterns interaction with spilling . New implementation take a scalar register and generate a vector without COPY_TO_REGCLASS (turn it into a VR128 register ) .The issue is that during register allocation we may spill a scalar value using 128-bit loads and stores, wasting cache bandwidth. Differential Revision: http://reviews.llvm.org/D19579 llvm-svn: 268190	2016-05-01 08:40:00 +00:00
Craig Topper	5387c11293	[AVX512] Prefer AVX512 VPACK instructions over AVX/AVX2 instructions when VLX and BWI are supported. llvm-svn: 268189	2016-05-01 06:52:19 +00:00
Tom Stellard	6245c9db08	AMDGPU/SI: Remove wait state handling for SMRD in SIInsertWaits This was supposed to be part of r268143. llvm-svn: 268154	2016-04-30 04:04:48 +00:00
Tom Stellard	51b37329c1	AMDGPU/SI: Enable the post-ra scheduler Summary: This includes a hazard recognizer implementation to replace some of the hazard handling we had during frame index elimination. Reviewers: arsenm Subscribers: qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D18602 llvm-svn: 268143	2016-04-30 00:23:06 +00:00
Haicheng Wu	611abb7dc9	[MBP] Use Function::optForSize() instead of checking OptimizeForSize directly. Fix a FIXME. Disable loop alignment if compiled with -Oz now. llvm-svn: 268121	2016-04-29 22:01:10 +00:00
Matt Arsenault	42ea6294ae	AMDGPU: Fix crash with unreachable terminators. If a block has no successors because it ends in unreachable, this was accessing an invalid iterator. Also stop counting instructions that don't emit any real instructions. llvm-svn: 268119	2016-04-29 21:52:13 +00:00
Sriraman Tallam	899df7646a	Differential Revision: http://reviews.llvm.org/D19733 llvm-svn: 268106	2016-04-29 21:19:16 +00:00
Matt Arsenault	87a15c33eb	AMDGPU: Add kernarg.segment.ptr intrinsic llvm-svn: 268105	2016-04-29 21:16:52 +00:00
Matt Arsenault	1e65ead116	DAGCombiner: Reduce truncated shl width llvm-svn: 268094	2016-04-29 19:53:16 +00:00
Guozhi Wei	194cccd63b	[PPC] Enable shuffling of VSX vectors This patch fixes PR27078 by enabling shuffling of vectors if VSX is available. llvm-svn: 268064	2016-04-29 17:00:54 +00:00
Simon Dardis	ccaff6aa52	[mips][FastISel] A store is not a load. Correct trivial error. One of the failing tests from PR/27458. Reviewers: dsanders, vkalintiris, mcrosier Differential Review: http://reviews.llvm.org/D19726 llvm-svn: 268053	2016-04-29 16:07:47 +00:00
Krzysztof Parzyszek	d4659dc8ea	[Hexagon] Optimize addressing modes for load/store Patch by Jyotsna Verma. llvm-svn: 268051	2016-04-29 15:49:13 +00:00
Tom Stellard	33134ca52e	AMDGPU/SI: Add offset field to ds_permute/ds_bpermute instructions Summary: These instructions can add an immediate offset to the address, like other ds instructions. Reviewers: arsenm Subscribers: arsenm, scchan Differential Revision: http://reviews.llvm.org/D19233 llvm-svn: 268043	2016-04-29 14:34:26 +00:00
Nikolay Haustov	048a920e0e	AMDGPU/SI: Assembler: Unify parsing/printing of operands. Summary: The goal is for each operand type to have its own parse function and at the same time share common code for tracking state as different instruction types share operand types (e.g. glc/glc_flat, etc). Introduce parseAMDGPUOperand which can parse any optional operand. DPP and Clamp/OMod have custom handling for now. Sam also suggested to have class hierarchy for operand types instead of table. This can be done in separate change. Remove parseVOP3OptionalOps, parseDS*OptionalOps, parseFlatOptionalOps, parseMubufOptionalOps, parseDPPOptionalOps. Reduce number of definitions of AsmOperand's and MatchClasses' by using common base class. Rename AsmMatcher/InstPrinter methods accordingly. Print immediate type when printing parsed immediate operand. Use 'off' if offset/index register is unused instead of skipping it to make it more readable (also agreed with SP3). Update tests. Reviewers: tstellarAMD, SamWot, artem.tamazov Subscribers: qcolombet, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19584 llvm-svn: 268015	2016-04-29 09:02:30 +00:00
Matthias Braun	5d4a43cf37	RegisterPressure: Fix default lanemask for missing regunit intervals In case of missing live intervals for a physical registers getLanesWithProperty() would report 0 which was not a safe default in all situations. Add a parameter to pass in a safe default. No testcase because in-tree targets do not skip computing register unit live intervals. Also cleanup the getXXX() functions to not perform the RequireLiveIntervals checks anymore so we do not even need to return safe defaults. llvm-svn: 267977	2016-04-29 02:44:54 +00:00
Marcin Koscielnicki	832d560a7e	[PowerPC] Fix the EH_SjLj_Setup pseudo. This instruction is just a control flow marker - it should not actually exist in the object file. Unfortunately, nothing catches it before it gets to AsmPrinter. If integrated assembler is used, it's considered to be a normal 4-byte instruction, and emitted as an all-0 word, crashing the program. With external assembler, a comment is emitted. Fixed by setting Size to 0 and handling it in MCCodeEmitter - this means the comment will still be emitted if integrated assembler is not used. This broke an ASan test, which has been disabled for a long time as a result (see the discussion on D19657). We can reenable it once this lands. llvm-svn: 267943	2016-04-28 21:24:37 +00:00
Krzysztof Parzyszek	89f6a784c5	[RDF] Improve handling of inline-asm - Keep implicit defs from inline-asm instructions. - Treat register references from inline-asm as fixed. llvm-svn: 267936	2016-04-28 20:33:33 +00:00
Matt Arsenault	28f0a3fe58	AMDGPU: Emit error if too much LDS is used llvm-svn: 267922	2016-04-28 19:37:35 +00:00
Krzysztof Parzyszek	f555db9453	Reset the TopRPTracker's position in ScheduleDAGMILive::initQueues ScheduleDAGMI::initQueues changes the RegionBegin to the first non-debug instruction. Since it does not track register pressure, it does not affect any RP trackers. ScheduleDAGMILive inherits initQueues from ScheduleDAGMI, and it does reset the TopTPTracker in its schedule method. Any derived, target-specific scheduler will need to do it as well, but the TopRPTracker is only exposed as a "const" object to derived classes. Without the ability to modify the tracker directly, this leaves a derived scheduler with a potential of having the TopRPTracker out-of-sync with the CurrentTop. The symptom of the problem: void llvm::ScheduleDAGMILive::scheduleMI(llvm::SUnit *, bool): Assertion `TopRPTracker.getPos() == CurrentTop && "out of sync"' failed. Differential Revision: http://reviews.llvm.org/D19438 llvm-svn: 267918	2016-04-28 19:17:44 +00:00
Matt Arsenault	f94836045a	AMDGPU: Fix mishandling array allocations when promoting alloca The canonical form for allocas is a single allocation of the array type. In case we see a non-canonical array alloca, make sure we aren't replacing this with an array N times smaller. llvm-svn: 267916	2016-04-28 18:38:48 +00:00
Simon Dardis	156870a1a4	[mips][atomics] Fix partword atomic binary operation implementation Currently Mips::emitAtomicBinaryPartword() does not properly respect the width of pointers. For MIPS64 this causes the memory address that the ll/sc sequence uses to be truncated. At runtime this causes a segmentation fault. This can be fixed by applying similar changes as r266204, so that a full 64bit pointer is loaded. Reviewers: dsanders Differential Review: http://reviews.llvm.org/D19651 llvm-svn: 267900	2016-04-28 16:26:43 +00:00
Krzysztof Parzyszek	49d1f997e6	[RDF] Handle undefined registers in RDF copy propagation When updating the graph, make sure that new uses without reaching defs are handled correctly. llvm-svn: 267891	2016-04-28 15:09:19 +00:00
Matthias Braun	18562ab366	CodeGen: Add DetectDeadLanes pass. The DetectDeadLanes pass performs a dataflow analysis of used/defined subregister lanes across COPY instructions and instructions that will get lowered to copies. It detects dead definitions and uses reading undefined values which are obscured by COPY and subregister usage. These dead definitions cause trouble in the register coalescer which cannot deal with definitions suddenly becoming dead after coalescing COPY instructions. For now the pass only adds dead and undef flags to machine operands. It should be possible to extend it in the future to remove the dead instructions and redo the analysis for the affected virtual registers. Differential Revision: http://reviews.llvm.org/D18427 llvm-svn: 267851	2016-04-28 03:07:16 +00:00
Bryan Chan	2567ab558c	[SystemZ] Support Swift Calling Convention Summary: Port rL265480, rL264754, rL265997 and rL266252 to SystemZ, in order to enable the Swift port on the architecture. SwiftSelf and SwiftError are assigned to R10 and R9, respectively, which are normally callee-saved registers. For more information, see: RFC: Implementing the Swift calling convention in LLVM and Clang https://groups.google.com/forum/#!topic/llvm-dev/epDd2w93kZ0 Reviewers: kbarton, manmanren, rjmccall, uweigand Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D19414 llvm-svn: 267823	2016-04-28 00:17:23 +00:00
Mitch Bodart	fde6615c97	[X86] Enable the post-RA-scheduler for clang's default 32-bit cpu. For compilations with no explicit cpu specified, this exhibits nice gains on Silvermont, with neutral performance on big cores. Differential Revision: http://reviews.llvm.org/D19138 llvm-svn: 267809	2016-04-27 22:52:35 +00:00
Quentin Colombet	d6bb035737	[X86][FastISel] Make sure we use the right register class when we select stores. llvm-svn: 267806	2016-04-27 22:33:42 +00:00
Quentin Colombet	96d6f82ab0	[X86] Fix the lowering of TLS calls. The callseq_end node must be glued with the TLS calls, otherwise, the generic code will miss the uses of the returned value and will mark it dead. Moreover, TLSCall 64-bit pseudo must not set an implicit-use on RDI, the pseudo uses the symbol address at this point not RDI and the lowering will do the right thing. llvm-svn: 267797	2016-04-27 21:37:37 +00:00
Matt Arsenault	982e737c85	AMDGPU: Account for globals in AMDGPUPromoteAlloca pass Patch by Bas Nieuwenhuizen llvm-svn: 267791	2016-04-27 21:05:08 +00:00
Ahmed Bougacha	e8bff14c32	[AArch64] Set correct successors in CMPXCHG pseudo expansion. transferSuccessors() would LoadCmpBB a successor of DoneBB, whereas it should be a successor of the original MBB. Follow-up to r266339. Unfortunately, it's tricky to catch this in the verifier. llvm-svn: 267779	2016-04-27 20:33:02 +00:00
Ahmed Bougacha	492c1a346a	[ARM] Set correct successors in CMPXCHG pseudo expansion. transferSuccessors() would LoadCmpBB a successor of DoneBB, whereas it should be a successor of the original MBB. The testcase changes are caused by Thumb2SizeReduction, which was previously confused by the broken CFG. Follow-up to r266679. Unfortunately, it's tricky to catch this in the verifier. llvm-svn: 267778	2016-04-27 20:32:54 +00:00
Kevin B. Smith	1783031f2d	[X86]: Quit promoting 16 bit loads to 32 bit. Differential Revision: http://reviews.llvm.org/D19592 llvm-svn: 267773	2016-04-27 19:58:03 +00:00
Marcin Koscielnicki	1e17bfd3e5	[Mips] Add support for llvm.thread.pointer intrinsic. This will be used to implement __builtin_thread_pointer in clang. Differential Revision: http://reviews.llvm.org/D19569 llvm-svn: 267743	2016-04-27 17:21:49 +00:00
Nicolai Haehnle	494b4aee1e	AMDGPU/SI: Add llvm.amdgcn.s.waitcnt.all intrinsic Summary: So it appears that to guarantee some of the ordering requirements of a GLSL memoryBarrier() executed in the shader, we need to emit an s_waitcnt. (We can't use an s_barrier, because memoryBarrier() may appear anywhere in the shader, in particular it may appear in non-uniform control flow.) Reviewers: arsenm, mareko, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19203 llvm-svn: 267729	2016-04-27 15:46:01 +00:00
Artem Tamazov	0b6855273a	[AMDGPU][llvm-mc] s_getreg/setreg* - Support symbolic names of hardware registers. Possibility to specify code of hardware register kept. Disassemble to symbolic name, if name is known. Tests updated/added. Differential Revision: http://reviews.llvm.org/D19335 llvm-svn: 267724	2016-04-27 15:17:03 +00:00
Nico Weber	b519b357d0	Revert r267649, it caused PR27539. llvm-svn: 267723	2016-04-27 15:16:54 +00:00
Zlatko Buljan	92f1550331	[mips][microMIPS] Add CodeGen support for SUBU16, SUB, SUBU, DSUB and DSUBU instructions Differential Revision: http://reviews.llvm.org/D16676 llvm-svn: 267694	2016-04-27 11:31:44 +00:00
Zlatko Buljan	a2323fb2af	[mips][microMIPS] Add CodeGen support for SLL16, SRL16, SLL, SLLV, SRA, SRAV, SRL and SRLV instructions Differential Revision: http://reviews.llvm.org/D17989 llvm-svn: 267693	2016-04-27 11:02:23 +00:00
Chuang-Yu Cheng	d389efdf8e	[ppc64] fix bug in prologue that mfocrf's cr operand should be explict state instead of implicit This fixes PR27414 Reviewers: kbarton mgrang tjablin http://reviews.llvm.org/D19255 llvm-svn: 267660	2016-04-27 02:59:28 +00:00
Ahmed Bougacha	991d42e979	[X86] Don't assume that MMX extractelts are from index 0. It's probably the case for all 3 MMX users out there, but with hand-crafted IR, you can trigger selection failures. Fix that. llvm-svn: 267652	2016-04-27 01:35:29 +00:00
Ahmed Bougacha	208a5db302	[X86] Re-enable MMX i32 extractelt combine. This effectively adds back the extractelt combine removed by r262358: the direct case can still occur (because x86_mmx is special, see r262446), but it's the indirect case that's now superseded by the generic combine. llvm-svn: 267651	2016-04-27 01:35:25 +00:00
Cong Hou	3dea148bfe	Detects the SAD pattern on X86 so that much better code will be emitted once the pattern is matched. Differential revision: http://reviews.llvm.org/D14840 llvm-svn: 267649	2016-04-27 01:29:18 +00:00
Quentin Colombet	c01de3fc6a	[X86] Make sure it is safe to clobber EFLAGS, if need be, when choosing the prologue. Do not use basic blocks that have EFLAGS live-in as prologue if we need to realign the stack. Realigning the stack uses AND instruction and this clobbers EFLAGS. An other alternative would have been to save and restore EFLAGS around the stack realignment code, but this is likely inefficient. Fixes PR27531. llvm-svn: 267634	2016-04-26 23:44:14 +00:00
Mitch Bodart	d1778e20f3	[X86] Replace -mcpu with -mattr in several tests Differential Revision: http://reviews.llvm.org/D19568 llvm-svn: 267629	2016-04-26 23:36:38 +00:00
Quentin Colombet	67573257d1	[MachineBasicBlock] Take advantage of the partially dead information. Thanks to that information we wouldn't lie on a register being live whereas it is not. llvm-svn: 267622	2016-04-26 23:14:29 +00:00
Quentin Colombet	c2937566b8	[MachineInstrBundle] Improvement the recognition of dead definitions. Now, it is possible to know that partial definitions are dead definitions and recognize that clobbered registers are also dead. llvm-svn: 267621	2016-04-26 23:14:24 +00:00
Krzysztof Parzyszek	8d29c2a6a5	[Tail duplication] Handle source registers with subregisters When a block is tail-duplicated, the PHI nodes from that block are replaced with appropriate COPY instructions. When those PHI nodes contained use operands with subregisters, the subregisters were dropped from the COPY instructions, resulting in incorrect code. Keep track of the subregister information and use this information when remapping instructions from the duplicated block. Differential Revision: http://reviews.llvm.org/D19337 llvm-svn: 267583	2016-04-26 18:36:34 +00:00
Manman Ren	e3c0ba8445	Swift Calling Convention: use %RAX for sret. We don't need to copy the sret argument into %rax upon return. rdar://25671494 llvm-svn: 267579	2016-04-26 18:08:06 +00:00
Saleem Abdulrasool	63a10bee58	tests: tweak MIR for ARM tests to correct MI issues The Machine Instruction Verifier flagged some issues in the serialized MIR. Adjust the input to correct them. Fixes the remaining portion of PR27480. llvm-svn: 267578	2016-04-26 17:54:21 +00:00
Saleem Abdulrasool	4348c4ad02	test: remove some bleeding whitespace Kill bleeding whitespace. NFC llvm-svn: 267577	2016-04-26 17:54:16 +00:00
Sanjay Patel	7177ce0576	[CodeGenPrepare] use branch weight metadata to decide if a select should be turned into a branch This is part of solving PR27344: https://llvm.org/bugs/show_bug.cgi?id=27344 CGP should undo the SimplifyCFG transform for the same reason that earlier patches have used this same mechanism: it's possible that passes between SimplifyCFG and CGP may be able to optimize the IR further with a select in place. For the TLI hook default, >99% taken or not taken is chosen as the default threshold for a highly predictable branch. Even the most limited HW branch predictors will be correct on this branch almost all the time, so even a massive mispredict penalty perf loss would be overcome by the win from all the times the branch was predicted correctly. As a follow-up, we could make the default target hook less conservative by using the SchedMachineModel's MispredictPenalty. Or we could just let targets override the default by implementing the hook with that and other target-specific options. Note that trying to statically determine mispredict rates for close-to-balanced profile weight data is generally impossible if the HW is sufficiently advanced. Ie, 50/50 taken/not-taken might still be 100% predictable. Finally, note that this patch as-is will not solve PR27344 because the current __builtin_unpredictable() branch weight default values are 4 and 64. A proposal to change that is in D19435. Differential Revision: http://reviews.llvm.org/D19488 llvm-svn: 267572	2016-04-26 17:11:17 +00:00
Konstantin Zhuravlyov	a8b24aaab2	[AMDGPU] Reserve VGPRs for trap handler usage if instructed Differential Revision: http://reviews.llvm.org/D19235 llvm-svn: 267563	2016-04-26 15:43:14 +00:00
Andrey Turetskiy	53086bdd4e	[X86] PR27502: Fix the LEA optimization pass. Handle MachineBasicBlock as a memory displacement operand in the LEA optimization pass. Differential Revision: http://reviews.llvm.org/D19409 llvm-svn: 267551	2016-04-26 12:18:12 +00:00
Marcin Koscielnicki	704c818d77	[PowerPC] Add support for llvm.thread.pointer Differential Revision: http://reviews.llvm.org/D19304 llvm-svn: 267546	2016-04-26 10:37:22 +00:00
Marcin Koscielnicki	599068857b	[SPARC] [SSP] Add support for LOAD_STACK_GUARD. This fixes PR22248 on sparc. Differential Revision: http://reviews.llvm.org/D19386 llvm-svn: 267545	2016-04-26 10:37:14 +00:00
Marcin Koscielnicki	470e89bab4	[SPARC] Add support for llvm.thread.pointer. Differential Revision: http://reviews.llvm.org/D19387 llvm-svn: 267544	2016-04-26 10:37:01 +00:00
Craig Topper	b7db006017	[AArch64] Expand v1i64 and v2i64 ctlz. The default is legal, which results in 'Cannot select' errors. llvm-svn: 267522	2016-04-26 05:26:51 +00:00
Craig Topper	4513730c09	[ARM] Expand vector ctlz_zero_undef so it becomes ctlz. The default is Legal, which results in 'Cannot select' errors. llvm-svn: 267521	2016-04-26 05:04:37 +00:00
Craig Topper	783834f3cf	[ARM] Expand v1i64 and v2i64 ctlz. The default is legal, which results in 'Cannot select' errors. llvm-svn: 267520	2016-04-26 05:04:33 +00:00
Richard Trieu	0f158489d6	Pass the test file in through stdin instead of by filename. When passed in via filename, this test will fail if the path to the test has the strings "f1" and "f2" in somewhere. Pass the file through stdin to prevent test failures due to coincidences in path names. llvm-svn: 267517	2016-04-26 03:43:49 +00:00
Dan Gohman	2ad6a4c0e6	[WebAssembly] Account for implicit operands when computing operand indices. llvm-svn: 267511	2016-04-26 01:40:56 +00:00
Ahmed Bougacha	b6c12fe106	[X86] Use LivePhysRegs in X86FixupBWInsts. Kill-flags, which computeRegisterLiveness uses, are not reliable. LivePhysRegs is. Differential Revision: http://reviews.llvm.org/D19472 llvm-svn: 267495	2016-04-26 00:00:48 +00:00
James Y Knight	439f0092c7	[Sparc] Fix double-float fabs and fneg on little endian CPUs. The SparcV8 fneg and fabs instructions interestingly come only in a single-float variant. Since the sign bit is always the topmost bit no matter what size float it is, you simply operate on the high subregister, as if it were a single float. However, the layout of double-floats in the float registers is reversed on little-endian CPUs, so that the high bits are in the second subregister, rather than the first. Thus, this expansion must check the endianness to use the correct subregister. llvm-svn: 267489	2016-04-25 22:54:09 +00:00
Tim Northover	66f8d5ae59	ARM: put extern __thread stubs in a special section. The linker needs to know that the symbols are thread-local to do its job properly. llvm-svn: 267473	2016-04-25 21:12:04 +00:00
Quentin Colombet	7f8c56085e	Re-apply r267206 with a fix for the encoding problem: when the immediate of log2(Mask) is smaller than 32, we must use the 32-bit variant because the 64-bit variant cannot encode it. Therefore, set the subreg part accordingly. [AArch64] Fix optimizeCondBranch logic. The opcode for the optimized branch does not depend on the size of the activate bits in the AND masks, but the AND opcode itself. Indeed, we need to use a X or W variant based on the AND variant not based on whether the mask fits into the related variant. Otherwise, we may end up using the W variant of the optimized branch for 64-bit register inputs! This fixes the last make check verifier issues for AArch64: PR27479. llvm-svn: 267465	2016-04-25 20:54:08 +00:00
Matt Arsenault	b60850cb10	AMDGPU: Implement addrspacecast llvm-svn: 267452	2016-04-25 19:27:24 +00:00
Matt Arsenault	524b24258c	AMDGPU: Add queue ptr intrinsic llvm-svn: 267451	2016-04-25 19:27:18 +00:00
Krzysztof Parzyszek	7bc31501a1	[Hexagon] Register save/restore functions do not follow regular conventions Do not mark them as modifying any of the volatile registers by default. llvm-svn: 267433	2016-04-25 17:49:44 +00:00
Sanjay Patel	2bdc7d43ef	add tests for potential CGP transform (PR27344) llvm-svn: 267426	2016-04-25 16:56:52 +00:00
Marcin Koscielnicki	de3ced2d10	[PR27390] [CodeGen] Reject indexed loads in CombinerDAG. visitAND, when folding and (load) forgets to check which output of an indexed load is involved, happily folding the updated address output on the following testcase: target datalayout = "e-m:e-i64:64-n32:64" target triple = "powerpc64le-unknown-linux-gnu" %typ = type { i32, i32 } define signext i32 @_Z8access_pP1Tc(%typ* %p, i8 zeroext %type) { %b = getelementptr inbounds %typ, %typ* %p, i64 0, i32 1 %1 = load i32, i32* %b, align 4 %2 = ptrtoint i32* %b to i64 %3 = and i64 %2, -35184372088833 %4 = inttoptr i64 %3 to i32* %_msld = load i32, i32* %4, align 4 %zzz = add i32 %1, %_msld ret i32 %zzz } Fix this by checking ResNo. I've found a few more places that currently neglect to check for indexed load, and tightened them up as well, but I don't have test cases for them. In fact, they might not be triggerable at all, at least with current targets. Still, better safe than sorry. Differential Revision: http://reviews.llvm.org/D19202 llvm-svn: 267420	2016-04-25 15:43:44 +00:00
Hrvoje Varga	e25aaa57ef	[mips][microMIPS] Revert commit r267137 Commit r267137 was the reason for failing tests in LLVM test suite. llvm-svn: 267419	2016-04-25 15:40:08 +00:00
Sanjay Patel	74d1b00e49	[x86] auto-generate checks for cmov tests llvm-svn: 267417	2016-04-25 15:26:57 +00:00
David Majnemer	c78d15d0c3	[WinEH] Update SplitAnalysis::computeLastSplitPoint to cope with multiple EH successors We didn't have logic to correctly handle CFGs where there was more than one EH-pad successor (these are novel with WinEH). There were situations where a register was live in one exceptional successor but not another but the code as written would only consider the first exceptional successor it found. This resulted in split points which were insufficiently early if an invoke was present. This fixes PR27501. N.B. This removes getLandingPadSuccessor. llvm-svn: 267412	2016-04-25 14:31:32 +00:00
Silviu Baranga	6c665bec7d	[ARM] Add support for the X asm constraint Summary: This patch adds support for the X asm constraint. To do this, we lower the constraint to either a "w" or "r" constraint depending on the operand type (both constraints are supported on ARM). Fixes PR26493 Reviewers: t.p.northover, echristo, rengolin Subscribers: joker.eph, jgreenhalgh, aemerson, rengolin, llvm-commits Differential Revision: http://reviews.llvm.org/D19061 llvm-svn: 267411	2016-04-25 14:29:18 +00:00
Artem Tamazov	7fa01faba1	[AMDGPU][llvm-mc] s_getreg/setreg* - Add hwreg(...) syntax. Added hwreg(reg[,offset,width]) syntax. Default offset = 0, default width = 32. Possibility to specify 16-bit immediate kept. Added out-of-range checks. Disassembling is always to hwreg(...) format. Tests updated/added. Differential Revision: http://reviews.llvm.org/D19329 llvm-svn: 267410	2016-04-25 14:13:51 +00:00
Marcin Koscielnicki	5a59940bb3	[PowerPC] [PR27387] Disallow r0 for ADD8TLS. ADD8TLS, a variant of add instruction used for initial-exec TLS, currently accepts r0 as a source register. While add itself supports r0 just fine, linker can relax it to a local-exec sequence, converting it to addi - which doesn't support r0. Differential Revision: http://reviews.llvm.org/D19193 llvm-svn: 267388	2016-04-25 09:24:34 +00:00
Michael Zuckerman	4cc752e542	Fixing wrong mask size error. From __mmask8 to __mmask16. Was reviewed over the shoulder by AsafBadouh. Connected to review http://reviews.llvm.org/D19195. llvm-svn: 267379	2016-04-25 05:27:51 +00:00
Craig Topper	65fd322915	[X86] Add a complete set of tests for all operand sizes of cttz/ctlz with and without zero undef being lowered to bsf/bsr. llvm-svn: 267373	2016-04-25 01:01:15 +00:00
Simon Pilgrim	386d31614a	[X86][AVX] Added PR24935 test case llvm-svn: 267362	2016-04-24 20:30:48 +00:00
Saleem Abdulrasool	13a27b5a96	ARM: fix __chkstk Frame Setup on WoA This corrects the MI annotations for the stack adjustment following the __chkstk invocation. We were marking the original SP usage as a Def rather than Kill. The (new) assigned value is the definition, the original reference is killed. Adjust the ISelLowering to mark Kills and FrameSetup as well. This partially resolves PR27480. llvm-svn: 267361	2016-04-24 20:12:48 +00:00
Simon Pilgrim	30bc5e06bf	[X86][SSE] Added SSSE3/AVX/AVX2 BITREVERSE tests Codegen is pretty bad at the moment but could use PSHUFB quite efficiently llvm-svn: 267347	2016-04-24 15:45:06 +00:00
Simon Pilgrim	9549c9fa45	[X86][XOP] Fixed VPPERM permute op decoding (PR27472). Fixed issue with VPPERM target shuffle mask decoding that was incorrectly masking off the 3-bit permute op with a 2-bit mask. llvm-svn: 267346	2016-04-24 15:05:04 +00:00
Simon Pilgrim	1211a431ae	[X86][SSE] Improved support for decoding target shuffle masks through bitcasts Reused the ability to split constants of a type wider than the shuffle mask to work with masks generated from scalar constants transfered to xmm. This fixes an issue preventing PSHUFB target shuffle masks decoding rematerialized scalar constants and also exposes the XOP VPPERM bug described in PR27472. llvm-svn: 267343	2016-04-24 14:53:54 +00:00
Marcin Koscielnicki	6b999dbbc0	[SystemZ] [SSP] Add support for LOAD_STACK_GUARD. This fixes PR22248 on s390x. The previous attempt at this was D19101, which was before LOAD_STACK_GUARD existed. Compared to the previous version, this always emits a rather ugly block of 4 instructions, involving a thread pointer load that can't be shared with other potential users. However, this is necessary for SSP - spilling the guard value (or thread pointer used to load it) is counter to the goal, since it could be overwritten along with the frame it protects. Differential Revision: http://reviews.llvm.org/D19363 llvm-svn: 267340	2016-04-24 13:57:49 +00:00
Simon Pilgrim	79561eb759	[X86][SSE] Demonstrate issue with decoding shuffle masks that have been lowered as rematerialized constants on scalar unit Found whilst investigating PR27472 llvm-svn: 267339	2016-04-24 13:45:30 +00:00
Gerolf Hoflehner	f601ce3709	[MachineCombiner] Support for floating-point FMA on ARM64 (re-commit r267098) The original patch caused crashes because it could derefence a null pointer for SelectionDAGTargetInfo for targets that do not define it. Evaluates fmul+fadd -> fmadd combines and similar code sequences in the machine combiner. It adds support for float and double similar to the existing integer implementation. The key features are: - DAGCombiner checks whether it should combine greedily or let the machine combiner do the evaluation. This is only supported on ARM64. - It gives preference to throughput over latency: the heuristic used is to combine always in loops. The targets decides whether the machine combiner should optimize for throughput or latency. - Supports for fmadd, f(n)msub, fmla, fmls patterns - On by default at O3 ffast-math llvm-svn: 267328	2016-04-24 05:14:01 +00:00
Craig Topper	917b9d7a47	[X86] Fix patterns that turn cmove/cmovne+ctlz/cttz into lzcnt/tzcnt instructions. Only one of the conditions should be valid for each pattern, not both. Update tests accordingly. llvm-svn: 267311	2016-04-24 02:01:22 +00:00
Duncan P. N. Exon Smith	94512f24de	DebugInfo: Remove MDString-based type references Eliminate DITypeIdentifierMap and make DITypeRef a thin wrapper around DIType*. It is no longer legal to refer to a DICompositeType by its 'identifier:', and DIBuilder no longer retains all types with an 'identifier:' automatically. Aside from the bitcode upgrade, this is mainly removing logic to resolve an MDString-based reference to an actualy DIType. The commits leading up to this have made the implicit type map in DICompileUnit's 'retainedTypes:' field superfluous. This does not remove DITypeRef, DIScopeRef, DINodeRef, and DITypeRefArray, or stop using them in DI-related metadata. Although as of this commit they aren't serving a useful purpose, there are patchces under review to reuse them for CodeView support. The tests in LLVM were updated with deref-typerefs.sh, which is attached to the thread "[RFC] Lazy-loading of debug info metadata": http://lists.llvm.org/pipermail/llvm-dev/2016-April/098318.html llvm-svn: 267296	2016-04-23 21:08:00 +00:00
Renato Golin	d3f349208c	Revert "[AArch64] Fix optimizeCondBranch logic." This reverts commit r267206, as it broke self-hosting on AArch64. llvm-svn: 267294	2016-04-23 19:30:52 +00:00
Simon Pilgrim	581d3c8ed8	[X86][XOP] Added VPPERM -> BLEND-WITH-ZERO Test Currently failing due to poor blend matching, found whilst investigating PR27472 llvm-svn: 267282	2016-04-23 11:14:18 +00:00
Craig Topper	aebe1f85c0	[CodeGen] When promoting CTTZ operations to larger type, don't insert a select to detect if the input is zero to return the original size instead of the extended size. Instead just set the first bit in the zero extended part. llvm-svn: 267280	2016-04-23 05:20:47 +00:00
Matt Arsenault	3cbb6d7d74	AMDGPU: sext_inreg (srl x, K), vt -> bfe x, K, vt.Size llvm-svn: 267244	2016-04-22 22:59:16 +00:00
NAKAMURA Takumi	2e521b8375	Fix llvm/test/CodeGen/ARM/Windows/dbzchk.ll not to check mixed output, take #2 . llvm-svn: 267242	2016-04-22 22:51:48 +00:00
Matt Arsenault	b274219b5d	AMDGPU: Re-visit nodes in performAndCombine This fixes test regressions when i64 loads/stores are made promote. llvm-svn: 267240	2016-04-22 22:48:38 +00:00
Sriraman Tallam	bdcd2a52a2	Differential Revision: http://reviews.llvm.org/D19040 llvm-svn: 267229	2016-04-22 21:41:58 +00:00
Peter Collingbourne	d04766ba20	Introduce llvm.load.relative intrinsic. This intrinsic takes two arguments, ``%ptr`` and ``%offset``. It loads a 32-bit value from the address ``%ptr + %offset``, adds ``%ptr`` to that value and returns it. The constant folder specifically recognizes the form of this intrinsic and the constant initializers it may load from; if a loaded constant initializer is known to have the form ``i32 trunc(x - %ptr)``, the intrinsic call is folded to ``x``. LLVM provides that the calculation of such a constant initializer will not overflow at link time under the medium code model if ``x`` is an ``unnamed_addr`` function. However, it does not provide this guarantee for a constant initializer folded into a function body. This intrinsic can be used to avoid the possibility of overflows when loading from such a constant. Differential Revision: http://reviews.llvm.org/D18367 llvm-svn: 267223	2016-04-22 21:18:02 +00:00
Matt Arsenault	953215125b	DAGCombiner: Relax alignment restriction when changing store type If the target allows the alignment, this should be OK. llvm-svn: 267217	2016-04-22 21:01:41 +00:00
Peter Collingbourne	df3f55c3ab	CodeGen: Use PLT relocations for relative references to unnamed_addr functions. The relative vtable ABI (PR26723) needs PLT relocations to refer to virtual functions defined in other DSOs. The unnamed_addr attribute means that the function's address is not significant, so we're allowed to substitute it with the address of a PLT entry. Also includes a bonus feature: addends for COFF image-relative references. Differential Revision: http://reviews.llvm.org/D17938 llvm-svn: 267211	2016-04-22 20:40:10 +00:00
Matt Arsenault	9b242e0d0e	DAGCombiner: Relax alignment restriction when changing load type If the target allows the alignment, this should still be OK. llvm-svn: 267209	2016-04-22 20:21:36 +00:00
Quentin Colombet	dfaf1211a5	[AArch64] Fix optimizeCondBranch logic. The opcode for the optimized branch does not depend on the size of the activate bits in the AND masks, but the AND opcode itself. Indeed, we need to use a X or W variant based on the AND variant not based on whether the mask fits into the related variant. Otherwise, we may end up using the W variant of the optimized branch for 64-bit register inputs! This fixes the last make check verifier issues for AArch64: PR27479. llvm-svn: 267206	2016-04-22 20:09:58 +00:00
Matthias Braun	97996c46e1	MachineScheduler: Limit the size of the ready list. Avoid quadratic complexity in unusually large basic blocks by limiting the size of the ready lists. Differential Revision: http://reviews.llvm.org/D19349 llvm-svn: 267189	2016-04-22 19:09:17 +00:00
Quentin Colombet	e89cf71564	[AArch64] When creating MRS instruction, make sure the destination register is declared as a definition. This fixes the machine verifier error for CodeGen/AArch64/nzcv-save.ll. llvm-svn: 267185	2016-04-22 18:46:17 +00:00
Quentin Colombet	e4d08c7e23	[AArch64][AdvSIMDScalar] Update the kill flags correctly. We used to simply set the kill flags to true when transforming a scalar instruction to a vector one. SrcScalar1 = copy SrcVector1 ... = opScalar SrcScalar1 => SrcScalar1 = copy SrcVector1 ... = opVector SrcVector1<kill> This is obviously wrong. The proper update consists in: 1. Propagate the kill status from the copy to the new opVector 2. Reset the kill status on the copy, since the live-range of SrcVector1 got extended. This fixes some of the machine verifier errors for AArch64 with make check. llvm-svn: 267180	2016-04-22 18:09:14 +00:00
Saleem Abdulrasool	deec7429ad	test: split test into two runs Rather than checking both stdout and stderr simultaneously, split it into two tests. This apparently breaks on Windows where MSVCRT does not buffer output correctly. NFC. Thanks to chapuni for bringing the issue to my attention! llvm-svn: 267179	2016-04-22 18:06:51 +00:00
Krzysztof Parzyszek	78673d03bc	[Hexagon] Properly close live range in HexagonBlockRanges ---add testcase llvm-svn: 267174	2016-04-22 17:30:13 +00:00
Konstantin Zhuravlyov	613d1574ee	[AMDGPU] Insert nop pass: take care of outstanding feedback - Switch few loops to range-based for loops - Fix nop insertion at the end of BB - Fix formatting - Check for endpgm Differential Revision: http://reviews.llvm.org/D19380 llvm-svn: 267167	2016-04-22 17:04:51 +00:00
Krzysztof Parzyszek	e1e99be481	[Hexagon] Teach mux expansion how to deal with undef predicates llvm-svn: 267165	2016-04-22 16:47:01 +00:00
Nirav Dave	2e439346e9	Emit code16 in assembly in 16-bit mode Summary: When generating assembly using -m16 we must explicitly mark it as 16-bit. Emit .code16 at beginning of file. Fixes wrong results when using -fno-integrated-as. Reviewers: dwmw2 Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D19392 llvm-svn: 267152	2016-04-22 13:36:11 +00:00
Simon Dardis	c64ea03044	[mips] Fix select patterns for MIPS64 When targetting MIPS64R6 some of the patterns for select were guarded by a broken predicate. The predicate was supposed to test if a constant value could fit in a 16 bit zero-extended field. Instead the value was tested to fit in a 16 bit sign-extended field. For negative constants of native word width this resulted in wrong code generation. Reviewers: vkalintiris, dsanders Differential Review: http://reviews.llvm.org/D19378 llvm-svn: 267151	2016-04-22 13:19:22 +00:00
Hrvoje Varga	0102a446f7	[mips][microMIPS] Implement SLT, SLTI, SLTIU, SLTU microMIPS32r6 instructions Differential Revision: http://reviews.llvm.org/D19354 llvm-svn: 267137	2016-04-22 11:18:40 +00:00
Daniel Sanders	ffe901fc26	Revert r267098 - [MachineCombiner] Support for floating-point FMA on ARM64 It introduced buildbot failures on clang-cmake-mips, clang-ppc64le-linux, among others. llvm-svn: 267127	2016-04-22 09:37:26 +00:00
Nicolai Haehnle	f54e57a212	AMDGPU/SI: add llvm.amdgcn.ps.live intrinsic Summary: This intrinsic returns true if the current thread belongs to a live pixel and false if it belongs to a pixel that we are executing only for derivative computation. It will be used by Mesa to implement gl_HelperInvocation. Note that for pixels that are killed during the shader, this implementation also returns true, but it doesn't matter because those pixels are always disabled in the EXEC mask. This unearthed a corner case in the instruction verifier, which complained about a v_cndmask 0, 1, exec, exec<imp-use> instruction. That's stupid but correct code, so make the verifier accept it as such. Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19191 llvm-svn: 267102	2016-04-22 04:04:08 +00:00
Craig Topper	6ea4b97a42	[AVX512] Teach lowering to use vplzcntd/q to implement 128/256-bit CTTZ_ZERO_UNDEF even without VLX support. We can just extend to 512-bits and extract like we do for CTLZ. llvm-svn: 267100	2016-04-22 03:22:38 +00:00
Gerolf Hoflehner	d63bafa58b	[MachineCombiner] Support for floating-point FMA on ARM64 Evaluates fmul+fadd -> fmadd combines and similar code sequences in the machine combiner. It adds support for float and double similar to the existing integer implementation. The key features are: - DAGCombiner checks whether it should combine greedily or let the machine combiner do the evaluation. This is only supported on ARM64. - It gives preference to throughput over latency: the heuristic used is to combine always in loops. The targets decides whether the machine combiner should optimize for throughput or latency. - Supports for fmadd, f(n)msub, fmla, fmls patterns - On by default at O3 ffast-math llvm-svn: 267098	2016-04-22 02:15:19 +00:00
Nico Weber	84e1414d06	Try to fix UNRESOLVED: LLVM :: CodeGen/AArch64/arm64-regress-opt-cmp.s on bots. This test used to write a .s file until r266971 fixed that. But on most bots, the .s file still exists. Add an rm statement to clean up the bots. In a few days, this statement can go away again. llvm-svn: 267095	2016-04-22 01:08:56 +00:00
Saleem Abdulrasool	eed028bbcd	ARM: fix test for Windows division This was meant to be part of SVN r267080. cbz cannot use a high register, which would be silently truncated. This has now been fixed. llvm-svn: 267092	2016-04-22 01:03:38 +00:00

1 2 3 4 5 ...

15839 Commits