llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-26 06:22:56 +02:00

Author	SHA1	Message	Date
Nirav Dave	3bc0a0e24a	[ARM][MC] Cleanup ARM Target Assembly Parser Summary: Correctly parse end-of-statement tokens and handle preprocessor end-of-line comments in ARM assembly processor. Reviewers: rnk, majnemer Subscribers: aemerson, rengolin, llvm-commits Differential Revision: https://reviews.llvm.org/D26152 llvm-svn: 285830	2016-11-02 16:22:51 +00:00
Vasileios Kalintiris	a63b0d284c	[mips] Always run the MipsOptimizePICCall pass. Summary: Remove this pass from addMachineSSAOptimization() and register it unconditionally in through addPreRegAlloc(). This pass is required for generating correct PIC calls. Reviewers: sdardis Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26036 llvm-svn: 285814	2016-11-02 15:11:27 +00:00
Joerg Sonnenberger	c71771b919	Create the virtual register for the global base in the intersection of GPRC and GPRC_NOR0 (or the 64bit equivalent) and not just the latter. GPRC_NOR0 contains ZERO as alternative meaning of r0 and is therefore not a true subclass of GPRC. llvm-svn: 285813	2016-11-02 15:00:31 +00:00
Aaron Ballman	fb1ac9442e	Removing a switch statement that contains a default label, but no case labels. Silences an MSVC warning; NFC. llvm-svn: 285806	2016-11-02 13:58:57 +00:00
Ulrich Weigand	f5a6992cfd	[SystemZ] Fix compiler warnings introduced by r285574 SystemZAsmParser::parseOperand returns a bool, not an enum. llvm-svn: 285800	2016-11-02 11:32:28 +00:00
Kirill Bobyrev	1b49e93486	[llvm] FIx if-clause -Wmisleading-indentation issue. While bootstrapping Clang with recent `gcc 6.2.0` I found a bug related to misleading indentation. I believe, a pair of `{}` was forgotten, especially given the above similar piece of code: ``` if (!RDef \|\| !HII->isPredicable(*RDef)) { Done = coalesceRegisters(RD, RegisterRef(S1)); if (Done) { UpdRegs.insert(RD.Reg); UpdRegs.insert(S1.getReg()); } } ``` Reviewers: kparzysz Differential Revision: https://reviews.llvm.org/D26204 llvm-svn: 285794	2016-11-02 10:00:40 +00:00
Dylan McKay	35c1d0025e	[AVR] Add instruction selection lowering code Summary: This adds AVRISelLowering.cpp Reviewers: arsenm, kparzysz Subscribers: llvm-commits, modocache, japaric, wdng, beanz, mgorny Differential Revision: https://reviews.llvm.org/D25034 llvm-svn: 285790	2016-11-02 06:47:40 +00:00
Peter Collingbourne	dad5df2ecb	Support: Remove MemoryObject and DataStreamer interfaces. These interfaces are no longer used. Differential Revision: https://reviews.llvm.org/D26222 llvm-svn: 285774	2016-11-02 00:08:37 +00:00
Alex Bradbury	2903b8e615	[RISCV] Add bare-bones RISC-V MCTargetDesc This is enough to compile and link but doesn't yet do anything particularly useful. Once an ASM parser and printer are added in the next two patches, the whole thing can be usefully tested. Differential Revision: https://reviews.llvm.org/D23562 llvm-svn: 285770	2016-11-01 23:47:30 +00:00
Alex Bradbury	b5fde80c0e	[RISCV 4/10] Add basic RISCV{InstrFormats,InstrInfo,RegisterInfo,}.td For now, only add instruction definitions for basic ALU operations. Our initial target is a working MC layer rather than codegen, so appropriate SelectionDAG patterns will come later. Differential Revision: https://reviews.llvm.org/D23561 llvm-svn: 285769	2016-11-01 23:40:28 +00:00
Matt Arsenault	3fd81027a1	AMDGPU: Handle CopyToReg in getOperandRegClass llvm-svn: 285768	2016-11-01 23:22:17 +00:00
Matt Arsenault	269a08f691	AMDGPU: Use brev for materializing SGPR constants This is already done with VGPR immediates and saves 4 bytes. llvm-svn: 285765	2016-11-01 23:14:20 +00:00
Matt Arsenault	930d9b8579	AMDGPU: Default to using scalar mov to materialize immediate This is the conservatively correct way because it's easy to move or replace a scalar immediate. This was incorrect in the case when the register class wasn't known from the static instruction definition, but still needed to be an SGPR. The main example of this is inlineasm has an SGPR constraint. Also start verifying the register classes of inlineasm operands. llvm-svn: 285762	2016-11-01 22:55:07 +00:00
Matt Arsenault	3085819547	AMDGPU: Stop creating unused virtual registers These are only used in the spill to VMEM path. Move them to the one use. llvm-svn: 285756	2016-11-01 21:58:07 +00:00
Matt Arsenault	5a24f1da90	AMDGPU: Workaround for instruction size with literals Instructions with a 32-bit base encoding with an optional 32-bit literal encoded after them report their size as 4 for the disassembler. Consider these when computing the MachineInstr size. This fixes problems caused by size estimate consistency in BranchRelaxation. llvm-svn: 285743	2016-11-01 20:42:24 +00:00
Krzysztof Parzyszek	894aed3c86	[Hexagon] Rename operand/predicate names for unshifted integers For example, rename s6Ext to s6_0Ext. The names for shifted integers include the underscore and this will make the naming consistent. It also exposed a few duplicates that were removed. llvm-svn: 285728	2016-11-01 19:02:10 +00:00
Konstantin Zhuravlyov	f9b60c4ef0	[AMDGPU] Check if type transforms to i16 (VI+) when getting AMDGPUISD::FFBH_U32 This will prevent following regression when enabling i16 support (D18049): test/CodeGen/AMDGPU/ctlz.ll test/CodeGen/AMDGPU/ctlz_zero_undef.ll Differential Revision: https://reviews.llvm.org/D25802 llvm-svn: 285716	2016-11-01 17:49:33 +00:00
Alex Bradbury	4b9563cf55	[RISCV] Add stub backend This contains just enough for lib/Target/RISCV to compile. Notably a basic RISCVTargetMachine and RISCVTargetInfo. At this point you can attempt llc -march=riscv32 myinput.ll and will find it fails due to the lack of MCAsmInfo. See http://lists.llvm.org/pipermail/llvm-dev/2016-August/103748.html for further discussion Differential Revision: https://reviews.llvm.org/D23560 llvm-svn: 285712	2016-11-01 17:27:54 +00:00
Tom Stellard	dc07351d2b	AMDGPU: Fix buildbots broken by r285704 llvm-svn: 285711	2016-11-01 17:20:03 +00:00
Alex Bradbury	2fa138eff6	[TableGen] Move OperandMatchResultTy enum to MCTargetAsmParser.h As it stands, the OperandMatchResultTy is only included in the generated header if there is custom operand parsing. However, almost all backends make use of MatchOperand_Success and friends from OperandMatchResultTy for e.g. parseRegister. This is a pain when starting an AsmParser for a new backend that doesn't yet have custom operand parsing. Move the enum to MCTargetAsmParser.h. This patch is a prerequisite for D23563 Differential Revision: https://reviews.llvm.org/D23496 llvm-svn: 285705	2016-11-01 16:32:05 +00:00
Tom Stellard	f13abceda7	AMDGPU: Implement expansion of f16 = FP_TO_FP16 f64 I wanted to implement this as a target independent expansion, however when targets say they want to expand FP_TO_FP16 what they actually want is the unsafe math expansion when possible and expansion to a libcall in all other cases. The only way to make this work as a target independent would be to add logic to target's TargetLowering construction to mark theses nodes as Expand when LegalizeDAG can use the unsafe expansion and mark them as LibCall when it cannot. I think this would be possible, but I think it would be too fragile and complex as it would require targets to keep their expansion logic up to date with the code in LegalizeDAG. Reviewers: bogner, ab, t.p.northover, arsenm Subscribers: wdng, llvm-commits, nhaehnle Differential Revision: https://reviews.llvm.org/D25999 llvm-svn: 285704	2016-11-01 16:31:48 +00:00
James Molloy	5538ea11d7	[Thumb-1] Synthesize TBB/TBH instructions to make use of compressed jump tables [Reapplying r284580 and r285917 with fix and testing to ensure emitted jump tables for Thumb-1 have 4-byte alignment] The TBB and TBH instructions in Thumb-2 allow jump tables to be compressed into sequences of bytes or shorts respectively. These instructions do not exist in Thumb-1, however it is possible to synthesize them out of a sequence of other instructions. It turns out this sequence is so short that it's almost never a lose for performance and is ALWAYS a significant win for code size. TBB example: Before: lsls r0, r0, #2 After: add r0, pc adr r1, .LJTI0_0 ldrb r0, [r0, #6] ldr r0, [r0, r1] lsls r0, r0, #1 mov pc, r0 add pc, r0 => No change in prologue code size or dynamic instruction count. Jump table shrunk by a factor of 4. The only case that can increase dynamic instruction count is the TBH case: Before: lsls r0, r4, #2 After: lsls r4, r4, #1 adr r1, .LJTI0_0 add r4, pc ldr r0, [r0, r1] ldrh r4, [r4, #6] mov pc, r0 lsls r4, r4, #1 add pc, r4 => 1 more instruction in prologue. Jump table shrunk by a factor of 2. So there is an argument that this should be disabled when optimizing for performance (and a TBH needs to be generated). I'm not so sure about that in practice, because on small cores with Thumb-1 performance is often tied to code size. But I'm willing to turn it off when optimizing for performance if people want (also note that TBHs are fairly rare in practice!) llvm-svn: 285690	2016-11-01 13:37:41 +00:00
Valery Pykhtin	d1c9440bfa	[AMDGPU] Expand vector mulhu/mulhs Differential revision: https://reviews.llvm.org/D26077 llvm-svn: 285684	2016-11-01 10:26:48 +00:00
Nemanja Ivanovic	87dcd90e97	[PowerPC] Implement vector shift builtins - llvm portion This patch corresponds to review https://reviews.llvm.org/D26095. Committing on behalf of Tony Jiang. llvm-svn: 285681	2016-11-01 09:42:32 +00:00
Matt Arsenault	bb971d2e8b	AMDGPU: Whitespace fixes llvm-svn: 285659	2016-11-01 00:55:14 +00:00
Davide Italiano	109eacd399	[Hexagon] Garbage collect dead code. llvm-svn: 285654	2016-10-31 22:56:56 +00:00
Saleem Abdulrasool	2e23098170	CodeGen: further loosen -O0 CG for WoA division Generate the slowest possible codepath for noopt CodeGen. Even trying to be clever with the negated jump can cause out-of-range jumps. Use a wide branch instead. Although the code is modelled simplistically, the later optimizations would recombine the branching into `cbz` if possible. This re-enables the previous optimization as well as hopefully gives us working code in all cases. Addresses PR30356! llvm-svn: 285649	2016-10-31 22:12:37 +00:00
Justin Lebar	63fe7a59a1	[NVPTX] Remove NVPTXFavorNonGenericAddrSpaces pass. Summary: This has been replaced by the NVPTXInferAddressSpaces pass. We've had the new one as the default with the old one accessible via a flag for some months now, and we've had no problems. Reviewers: tra Subscribers: llvm-commits, jholewinski, jingyue, mgorny Differential Revision: https://reviews.llvm.org/D26165 llvm-svn: 285642	2016-10-31 21:51:42 +00:00
Nemanja Ivanovic	cc5762c498	[PPC] add absolute difference altivec instructions and matching intrinsics This patch corresponds to review https://reviews.llvm.org/D26072. Committing on behalf of Sean Fertile. llvm-svn: 285627	2016-10-31 19:47:52 +00:00
Tim Northover	d7bf67d584	GlobalISel: allow truncating pointer casts on AArch64. llvm-svn: 285615	2016-10-31 18:31:09 +00:00
Tim Northover	f9cdcd6b12	GlobalISel: translate stack protector intrinsics llvm-svn: 285614	2016-10-31 18:30:59 +00:00
Michael Zuckerman	9770a3262e	[x86][inline-asm][AVX512][llvm][PART-2] Introducing "k" and "Yk" constraints for extended inline assembly, enabling use of AVX512 masked vectorized instructions. Commit on behalf of mharoush Extending inline assembly support, compatible with GCC as folowing: "k" constraint hints the compiler to select any of AVX512 k0-k7 registers. "Yk" constraint is a subset of "k" excluding k0 which is not allowd to be used as a mask. Reviewer: 1. rnk Differential Revision: https://reviews.llvm.org/D25062 llvm-svn: 285591	2016-10-31 16:19:58 +00:00
Artem Tamazov	0eaeb67ad9	[AMDGPU][MC][gfx8] Support 20-bit immediate offset in SMEM instructions. Fixes Bug 30808. Note that passing subtarget information to predicates seems too complicated, so gfx8-specific def smrd_offset_20 introduced. Old gfx6/7-specific def renamed to smrd_offset_8 for clarity. Lit tests updated. Differential Revision: https://reviews.llvm.org/D26085 llvm-svn: 285590	2016-10-31 16:07:39 +00:00
Krzysztof Parzyszek	61b1df30d2	[Hexagon] Don't expand mux instructions with both sources identical llvm-svn: 285588	2016-10-31 15:45:09 +00:00
Ulrich Weigand	2b52d3cc02	[SystemZ] Rework processor feature definitions and add -mcpu=archX support This patch implements two changes: - Move processor feature definition into a new file SystemZFeatures.td, and provide explicit lists of supported and unsupported features for each level of the z/Architecture. This allows specifying unsupported features in the scheduler definition files for each processor. - Add optional aliases for the -mcpu processor names according to the level of the z/Architecture, for compatibility with other compilers on the platform. The supported aliases are: -mcpu=arch8 equals -mcpu=z10 -mcpu=arch9 equals -mcpu=z196 -mcpu=arch10 equals -mcpu=zEC12 -mcpu=arch11 equals -mcpu=z13 llvm-svn: 285577	2016-10-31 14:33:29 +00:00
Ulrich Weigand	d631b85665	[SystemZ] Guard LEFR/LFER with FeatureVector The LEFR/LFER pseudos are aliases for vector instructions and should therefore be guared by FeatureVector. If they aren't, the TableGen scheduler definition checking might complain that there is no data for those pseudos for pre-z13 machines. No functional change intended. llvm-svn: 285576	2016-10-31 14:28:43 +00:00
Ulrich Weigand	4b6332419a	[SystemZ] Correctly diagnose missing features in AsmParser Currently, when using an instruction that is not supported on the currently selected architecture, the LLVM assembler is likely to diagnose an "invalid operand" instead of a "missing feature". This is because many operands require a custom parser in order to be processed correctly, and if an instruction is not available according to the current feature set, the generated parser code will also not detect the associated custom operand parsers. Fixed by temporarily enabling all features while parsing operands. The missing features will then be correctly detected when actually parsing the instruction itself. llvm-svn: 285575	2016-10-31 14:25:05 +00:00
Ulrich Weigand	4c3632fc8c	[SystemZ] Fix encoding of MVCK and .insn ss LLVM currently treats the first operand of MVCK as if it were a regular base+index+displacement address. However, it is in fact a base+displacement combined with a length register field. While the two might look syntactically similar, there are two semantic differences: - %r0 is a valid length register, even though it cannot be used as an index register. - In an expression with just a single register like 0(%rX), the register is treated as base with normal addresses, while it is treated as the length register (with an empty base) for MVCK. Fixed by adding a new operand parser class BDRAddr and reworking the assembler parser to distinguish between address + length register operands and regular addresses. llvm-svn: 285574	2016-10-31 14:21:36 +00:00
Jonas Paulsson	0de10fd097	[SystemZ] Model 2 VBU units (not 1) in SystemZScheduleZ13.td. NFC. Review: Ulrich Weigand. llvm-svn: 285566	2016-10-31 13:05:48 +00:00
Alexey Bataev	40602a91a6	Improved cost model for FDIV and FSQRT, by Andrew Tischenko There is a bug describing poor cost model for floating point operations: Bug 29083 - [X86][SSE] Improve costs for floating point operations. This patch is the second one in series of patches dealing with cost model. Differential Revision: https://reviews.llvm.org/D25722 llvm-svn: 285564	2016-10-31 12:10:53 +00:00
Craig Topper	5dc0456c5e	[AVX-512] Add missing patterns for selecting masked vector extracts that started from shuffles. llvm-svn: 285546	2016-10-31 05:55:57 +00:00
Craig Topper	571efd5344	[X86] Use intrinsics table for PMADDUBSW and PMADDWD so that we can use the legacy intrinsics to select EVEX encoded instructions when available. This removes a couple tablegen classes that become unused after this change. Another class gained an additional parameter to allow PMADDUBSW to specify a different result type from its input type. llvm-svn: 285515	2016-10-30 06:56:16 +00:00
Craig Topper	d475dcad95	[X86] Don't use loadv2i64 on SSE version of PMULHRSW. Use memopv2i64 instead. This bug was introduced in r285501. llvm-svn: 285510	2016-10-30 00:02:55 +00:00
Craig Topper	693d21e48c	[X86] Use intrinsics table for VPMULHRSW intrincis so that the legacy intrinsics can select EVEX encoded instructions when available. This requires a minor rename of the instructions due to the use of different tablegen classes and how the names are concatenated. llvm-svn: 285501	2016-10-29 18:41:45 +00:00
Elena Demikhovsky	c869a69914	Fixed FMA + FNEG combine. Masked form of FMA should be omitted in this optimization. Differential Revision: https://reviews.llvm.org/D25984 llvm-svn: 285492	2016-10-29 08:44:46 +00:00
Matt Arsenault	3ee7b5cf1b	AMDGPU: Use 1/2pi inline imm on VI I'm guessing at how it is supposed to be printed llvm-svn: 285490	2016-10-29 04:05:06 +00:00
Matthias Braun	6de13f8bcf	AArch64DeadRegisterDefinitionsPass: Cleanup; NFC - Fix doxygen file comment - reduce indentation in loop - Factor out some common subexpressions - Move independent helper function out of class - Fix Changed flag (this is not strictly NFC but a bugfix, but the flag seems ignored anyway) llvm-svn: 285488	2016-10-29 01:03:41 +00:00
Tom Stellard	94e161c5b2	AMDGPU/SI: Don't use non-0 waitcnt values when waiting on Flat instructions Summary: Flat instruction can return out of order, so we need always need to wait for all the outstanding flat operations. Reviewers: tony-tye, arsenm Subscribers: kzhuravl, wdng, nhaehnle, llvm-commits, yaxunl Differential Revision: https://reviews.llvm.org/D25998 llvm-svn: 285479	2016-10-28 23:53:48 +00:00
Matt Arsenault	f557c5ed41	AMDGPU: Fix instruction flags for s_endpgm Set isReturn, remove hasSideEffects. Also remove hasCtrlDep, I'm not really sure what that does. llvm-svn: 285476	2016-10-28 23:00:38 +00:00
Matt Arsenault	a0090a0113	AMDGPU: Add definitions for scalar store instructions Also add glc bit to the scalar loads since they exist on VI and change the caching behavior. This currently has an assembler bug where the glc bit is incorrectly accepted on SI/CI which do not have it. llvm-svn: 285463	2016-10-28 21:55:15 +00:00
Matt Arsenault	342b4f2c0e	AMDGPU: Rename glc operand type While trying to add the glc bit to SMEM instructions on VI with the new refactoring I ran into some kind of shadowing problem for the glc operand when using the pseudoinstruction as a multiclass parameter. Everywhere that currently uses it defines the operand to have the same name as its type, i.e. glc:$glc which works. For some reason now it conflicts, and its up evaluating to the wrong thing. For the real encoding classes, let Inst{16} = !if(ps.has_glc, glc, ?); was not being evaluated and still visible in the Inst initializer in the expanded td file. In other cases I got a a different error about an illegal operand where this was using { 0 } initializer from the bits<1> glc initializer instead of evaluating it as false in the if. For consistency all of the operand types should probably be captialized to avoid conflicting with the variable names unless somebody has a better idea of how to fix this. llvm-svn: 285462	2016-10-28 21:55:08 +00:00
Justin Lebar	50ae991af5	[NVPTX] Compute 'rem' using the result of 'div', if possible. Summary: In isel, transform Num % Den into Num - (Num / Den) * Den if the result of Num / Den is already available. Reviewers: tra Subscribers: hfinkel, llvm-commits, jholewinski Differential Revision: https://reviews.llvm.org/D26090 llvm-svn: 285461	2016-10-28 21:44:00 +00:00
Matt Arsenault	d438707f8d	AMDGPU: Diagnose using too many SGPRs This is possible when using inline asm. llvm-svn: 285447	2016-10-28 20:31:47 +00:00
Matt Arsenault	0536acc73b	AMDGPU: Fix using incorrect private resource with no allocation It's possible to have a use of the private resource descriptor or scratch wave offset registers even though there are no allocated stack objects. This would result in continuing to use the maximum number reserved registers. This could go over the number of SGPRs available on VI, or violate the SGPR limit requested by the function attributes. llvm-svn: 285435	2016-10-28 19:43:31 +00:00
Nemanja Ivanovic	2304e2b9df	Implement vector count leading/trailing bytes with zero lsb and vector parity builtins - llvm portion This patch corresponds to review https://reviews.llvm.org/D26003. Committing on behalf of Zaara Syeda. llvm-svn: 285434	2016-10-28 19:38:24 +00:00
Krzysztof Parzyszek	2ba0f4a54f	[Hexagon] Maintain kill flags through splitting in expand-condsets Do not use LiveIntervals to recalculate kills, because that cannot be done accurately without implicit uses on predicated instructions. llvm-svn: 285409	2016-10-28 15:50:22 +00:00
Tom Stellard	d0568d0818	AMDGPU/SI: Handle hazard with s_rfe_b64 Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D25638 llvm-svn: 285368	2016-10-27 23:50:21 +00:00
Tom Stellard	40599ea21b	AMDGPU/SI: Handle hazard with sgpr lane selects for v_{read,write}lane Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D25637 llvm-svn: 285367	2016-10-27 23:42:29 +00:00
Tom Stellard	31deee10c4	AMDGPU/SI: Fix unused variable warning on non-debug builds llvm-svn: 285363	2016-10-27 23:28:03 +00:00
Tom Stellard	6924e02442	AMDGPU/SI: Handle hazard with > 8 byte VMEM stores Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D25577 llvm-svn: 285359	2016-10-27 23:05:31 +00:00
Tom Stellard	f628d64420	AMDGPU/SI: Handle s_setreg hazard in GCNHazardRecognizer Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D25528 llvm-svn: 285338	2016-10-27 20:39:09 +00:00
Simon Pilgrim	68d206a9d2	[X86][AVX512] Fix MUL v8i64 costs on non-AVX512DQ targets llvm-svn: 285329	2016-10-27 18:32:06 +00:00
Simon Pilgrim	02a2816235	[X86][AVX512DQ] Move v2i64 and v4i64 MUL lowering to tablegen As suggested by @igorb on D26011 llvm-svn: 285313	2016-10-27 17:07:40 +00:00
Saleem Abdulrasool	5d2e860995	ARM: ensure that the Windows DBZ check is in range The Windows ARM target expects the compiler to emit a division-by-zero check. The check would use the form of: cmp r?, #0 cbz .Ltrap b .Lbody .Lbody: ... .Ltrap: udf #249 @ __brkdiv0 This works great most of the time. However, if the body of the function is greater than 127 bytes, the branch target limitation of cbz becomes an issue. This occurs in the unoptimized code generation cases sometimes (like in compiler-rt). Since this is a matter of correctness, possibly pay a small penalty instead. We now form this slightly differently: cbnz .Lbody udf #249 @ __brkdiv0 .Lbody: ... The positive case is through the branch instead of being the next instruction. However, because of the basic block layout, the negated branch is going to be a short distance always (2 bytes away, after the inserted __brkdiv0). The new t__brkdiv0 instruction is required to explicitly mark the instruction as a terminator as the generic UDF instruction is not a terminator. Addresses PR30532! llvm-svn: 285312	2016-10-27 16:59:22 +00:00
Vasileios Kalintiris	119c41482e	[mips] Do not allow -opt-bisect-limit to skip the PIC call optimization pass. r282428 added the MipsOptimizePICCall as an opt-in pass that can be skipped when using the -opt-bisect-limit option. However, this pass is needed because it generates code that conforms to the o32 ABI specification by using the $t9 register for PIC calls with JALR instructions. This bug was exposed by the fact that skipFunction() also checks for the "optnone" attribute. This caused functions with that attribute to break the requirements of the o32 ABI. llvm-svn: 285305	2016-10-27 15:50:36 +00:00
Simon Pilgrim	268d797e5a	[X86][AVX512DQ] Improve lowering of MUL v2i64 and v4i64 With DQI but without VLX, lower v2i64 and v4i64 MUL operations with v8i64 MUL (vpmullq). Updated cost table accordingly. Differential Revision: https://reviews.llvm.org/D26011 llvm-svn: 285304	2016-10-27 15:27:00 +00:00
Krzysztof Parzyszek	e14110d47f	[Hexagon] Do not expand ISD::SELECT for HVX vectors llvm-svn: 285297	2016-10-27 14:30:16 +00:00
Sam Parker	5118fae0d2	[ARM] Predicate UMAAL selection on hasDSP. UMAAL is a DSP instruction and it is not available on thumbv7m (Cortex-M3) and thumbv6m (Cortex-M0+1) targets. Also fix wrong CHECK prefix in longMAC.ll test. Patch by Vadzim Dambrouski. Differential Revision: https://reviews.llvm.org/D25890 llvm-svn: 285278	2016-10-27 09:47:10 +00:00
Dylan McKay	de46f31ff7	[AVR] Generate all of the TableGen files we need This enables generation of all of the TableGen files that are used downstream. llvm-svn: 285274	2016-10-27 08:20:47 +00:00
Nicolai Haehnle	1a1eaef0b2	AMDGPU: Fix SILoadStoreOptimizer when writes cannot be merged due register dependencies Summary: When finding a match for a merge and collecting the instructions that must be moved, keep in mind that the instruction we merge might actually use one of the defs that are being moved. Fixes piglit spec/arb_enhanced_layouts/execution/component-layout/vs-tcs-load-output[-indirect]. The fact that the ds_read in the test case is not eliminated suggests that there might be another problem related to alias analysis, but that's a separate problem: this pass should still work correctly even when earlier optimization passes missed something or were disabled. Reviewers: tstellarAMD, arsenm Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D25829 llvm-svn: 285273	2016-10-27 08:15:07 +00:00
Dylan McKay	261a06f328	[AVR] Compile the disassembler This also updates references of 'TheAVRTarget' to the new 'getTheAVRTarget()' method. llvm-svn: 285272	2016-10-27 08:09:15 +00:00
Dylan McKay	d6e1be8ce1	[AVR] Add AVRISelDAGToDAG.cpp Summary: This pulls the AVR instruction selector in-tree. Reviewers: arsenm, kparzysz Subscribers: llvm-commits, wdng, beanz, japaric, mgorny Differential Revision: https://reviews.llvm.org/D25278 llvm-svn: 285270	2016-10-27 07:03:47 +00:00
Dylan McKay	a02b1d7d51	[AVR] Add the machine code emitter Reviewers: arsenm, kparzysz Subscribers: wdng, beanz, japaric, llvm-commits, mgorny Differential Revision: https://reviews.llvm.org/D25388 llvm-svn: 285269	2016-10-27 06:56:46 +00:00
Nemanja Ivanovic	54552bfee6	[PowerPC] - No SExt/ZExt needed for count trailing zeros This patch corresponds to review: https://reviews.llvm.org/D25896 It just eliminates the redundant ZExt after a count trailing zeros instruction. llvm-svn: 285267	2016-10-27 05:17:58 +00:00
Evandro Menezes	194192393f	[AArch64] Create feature set for Samsung Exynos-M2 Since Exynos-M2 improved the FP square root unit a bit over the one in Exynos-M1, it does not benefit from using the Newton series for such operations. llvm-svn: 285246	2016-10-26 22:06:20 +00:00
Tim Northover	b805bcfa72	ARM: don't rely on push/pop reglists being in order when folding SP adjust. It would be a very nice invariant to rely on, but unfortunately it doesn't necessarily hold (and the causes of mis-sorted reglists appear to be quite varied) so to be robust the frame lowering code can't assume that the first register in the list is also the first one that actually gets pushed. Should fix an issue where we were turning something like: push {r8, r4, r7, lr} sub sp, #24 into nonsense like: push {r2, r3, r4, r5, r6, r7, r8, r4, r7, lr} llvm-svn: 285232	2016-10-26 20:01:00 +00:00
Nemanja Ivanovic	2991b16281	[PowerPC] Implement vec_insert_exp builtins - llvm portion This revision corresponds to review: https://reviews.llvm.org/D25957. Committing on behalf of Zaara Syeda. llvm-svn: 285225	2016-10-26 19:03:40 +00:00
Chad Rosier	c0750fd641	[AArch64] Avoid materializing constant 1 when generating cneg instructions. Instead of cmp w0, #1 orr w8, wzr, #0x1 cneg w0, w8, ne we now generate cmp w0, #1 csinv w0, w0, wzr, eq PR28965 llvm-svn: 285217	2016-10-26 18:15:32 +00:00
Dan Gohman	bc67d8a523	[WebAssembly] Update the README.txt. Update the README.txt with newer information, add a link to the Emscripten page explaining the current easiest way to use the LLVM wasm backend, and mention that other ways of using the LLVM wasm backend are in development. llvm-svn: 285215	2016-10-26 17:44:09 +00:00
Yaxun Liu	f73bd0c195	AMDGPU: Refactor processor definition to use ISA version features Add missing ISA versions 7.0.2/8.0.4/8.1.0. to backend. Refactor processor definition to use ISA version features. Fixed ISA version for stoney. Based on Laurent Morichetti's patch. Differential Revision: https://reviews.llvm.org/D25919 llvm-svn: 285210	2016-10-26 16:37:56 +00:00
Matt Arsenault	f048150fc7	Reapply "AMDGPU: Don't use offen if it is 0" This reverts r283003 llvm-svn: 285203	2016-10-26 15:08:16 +00:00
Matt Arsenault	76328cf17a	AMDGPU: Fix counting si_mask_branch as 4 bytes llvm-svn: 285202	2016-10-26 14:53:54 +00:00
Tom Stellard	9c82dbf162	AMDGPU/SI: Don't emit multi-dword flat memory ops when they might access scratch Summary: A single flat memory operations that might access the scratch buffer can only access MaxPrivateElementSize bytes. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D25788 llvm-svn: 285198	2016-10-26 14:38:47 +00:00
Zvi Rackover	1f202ea0e7	[X86] AVX512 fallback for floating-point scalar selects Summary: In the case where of 'select i1 , f32, f32' or select i1, f64, f64 prefer lowering to masked-moves over branches. Fixes pr30561 Reviewers: igorb, aymanmus, delena Differential Revision: https://reviews.llvm.org/D25310 llvm-svn: 285196	2016-10-26 14:12:46 +00:00
Craig Topper	6232f507e5	[AVX-512] Add scalar vfmsub/vfnmsub mask3 intrinsics Summary: Clang's intrinsic header currently tries to negate the third operand of a vfmadd mask3 in order to create vfmsub, but this fails isel. This patch adds scalar vfmsub and vfnmsub mask3 that we can use instead to avoid the negate. This is consistent with the packed instructions. Reviewers: igorb, delena Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25933 llvm-svn: 285173	2016-10-26 04:59:58 +00:00
James Y Knight	38133c4e94	[Sparc] Don't overlap variable-sized allocas with other stack variables. On SparcV8, it was previously the case that a variable-sized alloca might overlap by 4-bytes the last fixed stack variable, effectively because 92 (the number of bytes reserved for the register spill area) != 96 (the offset added to SP for where to start a DYNAMIC_STACKALLOC). It's not as simple as changing 96 to 92, because variables that should be 8-byte aligned would then be misaligned. For now, simply increase the allocation size by 8 bytes for each dynamic allocation -- wastes space, but at least doesn't overlap. As the large comment says, doing this more efficiently will require larger changes in llvm. Also adds some test cases showing that we continue to not support dynamic stack allocation and over-alignment in the same function. llvm-svn: 285131	2016-10-25 22:13:28 +00:00
Evandro Menezes	c2dbab9873	[AArch64] Adjust the cost model for Exynos M1. Modify the maximum jump table size. llvm-svn: 285106	2016-10-25 20:05:42 +00:00
Dan Gohman	be114b6520	[WebAssembly] Add immediate fields to call_indirect and memory operators. call_indirect, grow_memory, and current_memory now have immediate operands in the 0xd binary encoding. llvm-svn: 285085	2016-10-25 16:55:52 +00:00
Ulrich Weigand	126b083b49	[SystemZ] Do not use LOC(G) for volatile loads It is not safe to use LOAD ON CONDITION to implement access to a memory location marked "volatile", since the architecture leaves it unspecified whether or not an access happens if the condition is false. The current code already appears to care about that: def LOC : CondUnaryRSY<"loc", 0xEBF2, nonvolatile_load, GR32, 4>; Unfortunately, that "nonvolatile_load" operator is simply ignored by the CondUnaryRSY class, and there was no test to catch it. llvm-svn: 285077	2016-10-25 15:39:15 +00:00
Simon Pilgrim	41c6a1165b	[X86][SSE] Add support for (V)PMOVSX* constant folding We already have (V)PMOVZX* combining support, this is the beginning of handling (V)PMOVSX* similarly - other combines in combineVSZext can be generalized in future patches. This unearthed an interesting bug in that we were generating illegal build vectors on 32-bit targets - it was proving difficult to create a test for it from PMOVZX, but it fired immediately with PMOVSX. I've created a more general form of the existing getConstVector to handle these cases - ideally this should be handled in non-target-specific code but I couldn't find an equivalent. Differential Revision: https://reviews.llvm.org/D25874 llvm-svn: 285072	2016-10-25 14:29:25 +00:00
Benjamin Kramer	058cf40c31	Fix an unused warning in WebAssemblyInstPrinter with NDEBUG. Patch by Sam McCall! Differential Revision: https://reviews.llvm.org/D25934 llvm-svn: 285055	2016-10-25 09:08:50 +00:00
Craig Topper	929845dd01	[AVX-512] Add support for creating SIGN_EXTEND_VECTOR_INREG and ZERO_EXTEND_VECTOR_INREG for 512-bit vectors to support vpmovzxbq and vpmovsxbq. Summary: The one tricky thing about this is that the sign/zero_extend_inreg uses v64i8 as an input type which isn't legal without BWI support. Though the vpmovsxbq and vpmovzxbq instructions themselves don't require BWI. To support this we need to add custom lowering for ZERO_EXTEND_VECTOR_INREG with v64i8 input. This can mostly reuse the existing sign extend code with a couple checks for sign extend vs zero extend added. Reviewers: delena, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25594 llvm-svn: 285053	2016-10-25 04:00:29 +00:00
Matthias Braun	456a8212fd	MachineInstrBundle: Pass iterators to getBundle(Start\|End); NFC This is a function to go backwards in a block to find the first instruction in a bundle, so iterator is a more natural choice for parameter/return rather than a reference to a MachineInstruction. llvm-svn: 285051	2016-10-25 02:55:17 +00:00
Dan Gohman	6a5c3acce4	[WebAssembly] Reorder load/store operands to match binary encoding. The p2align operand of a load/store is encoded before the offset operand; reorder the MachineInstr operands accordingly. llvm-svn: 285044	2016-10-25 00:17:11 +00:00
Dan Gohman	4d43f941c8	[WebAssembly] Implement more WebAssembly binary encoding. This changes locals from being declared by the emitLocal hook in WebAssemblyTargetStreamer, rather than with an instruction. After exploring the infastructure in LLVM more, this seems to make more sense since declaring locals doesn't use an encoded opcode. This also adds more 0xd opcodes, type encodings, and miscellaneous binary encoding bits. llvm-svn: 285040	2016-10-24 23:27:49 +00:00
Matthias Braun	9688a1523c	CodeGen/Passes: Pass MachineFunction as functor arg; NFC Passing a MachineFunction as argument is more natural and avoids an unnecessary round-trip through the logic determining the correct Subtarget because MachineFunction already has a reference anyway. llvm-svn: 285039	2016-10-24 23:23:02 +00:00
Matthias Braun	7fbfea2c1c	Use MachineInstr::mop_iterator instead of MIOperands; NFC (Const)?MIOperands is equivalent to the C++ style MachineInstr::mop_iterator. Use the latter for consistency except for a few callers of MIOperands::analyzePhysReg(). llvm-svn: 285029	2016-10-24 21:36:43 +00:00
Dan Gohman	84936ca9c3	[WebAssembly] Fix a broken URL. llvm-svn: 285017	2016-10-24 20:35:17 +00:00
Dan Gohman	c9b2af84a2	[WebAssembly] Define the `end` opcode value. CFGStackify differentiates between END_LOOP and END_BLOCK, but wasm itself doesn't. For now, just use the same opcode for both. llvm-svn: 285016	2016-10-24 20:32:04 +00:00
Dan Gohman	af7dc16568	[WebAssembly] Update opcode values according to recent spec changes. This corresponds to the "0xd" opcode renumbering. llvm-svn: 285014	2016-10-24 20:21:49 +00:00
Dan Gohman	a62715a542	[WebAssembly] Add an option to make get_local/set_local explicit. This patch adds a pass, controlled by an option and off by default for now, for making implicit get_local/set_local explicit. This simplifies emitting wasm with MC. Differential Revision: https://reviews.llvm.org/D25836 llvm-svn: 285009	2016-10-24 19:49:43 +00:00
Peter Collingbourne	543d51e147	Target: Change various section classifiers in TargetLoweringObjectFile to take a GlobalObject. These functions are about classifying a global which will actually be emitted, so it does not make sense for them to take a GlobalValue which may for example be an alias. Change the Mach-O object writer and the Hexagon, Lanai and MIPS backends to look through aliases before using TargetLoweringObjectFile interfaces. These are functional changes but all appear to be bug fixes. Differential Revision: https://reviews.llvm.org/D25917 llvm-svn: 285006	2016-10-24 19:23:39 +00:00
Krzysztof Parzyszek	91934888c9	Revert r284972 and remove other defaulted copy/move constructors/= David Blaikie pointed out that we get them for free without having to write anything. llvm-svn: 284996	2016-10-24 17:40:46 +00:00
Ehsan Amiri	e2a6f26a8d	[PPC] Generate positive FP zero using xor insn instead of loading from constant area https://reviews.llvm.org/D23614 Currently we load +0.0 from constant area. That can change to be generated using XOR instruction. llvm-svn: 284995	2016-10-24 17:31:09 +00:00
Eli Friedman	b17f521ba9	Revert r284580+r284917. ("Synthesize TBB/TBH instructions") The optimization has correctness issues, so reverting for now to fix tests on thumb1 targets. llvm-svn: 284993	2016-10-24 17:20:50 +00:00
Evandro Menezes	6efb1b6691	[AArch64] Optionally use the Newton series for reciprocal estimation Add support for estimating the square root or its reciprocal and division or reciprocal using the combiner generic Newton series. Differential revision: https://reviews.llvm.org/D25291 llvm-svn: 284986	2016-10-24 16:14:58 +00:00
Ehsan Amiri	851e2f985f	[PPC] Better codegen for AND, ANY_EXT, SRL sequence https://reviews.llvm.org/D24924 This improves the code generated for a sequence of AND, ANY_EXT, SRL instructions. This is a targetted fix for this special pattern. The pattern is generated by target independet dag combiner and so a more general fix may not be necessary. If we come across other similar cases, some ideas for handling it are discussed on the code review. llvm-svn: 284983	2016-10-24 15:46:58 +00:00
Nicolai Haehnle	274380d507	AMDGPU: Fix Two Address problems with v_movreld Summary: The v_movreld machine instruction is used with three operands that are in a sense tied to each other (the explicit VGPR_32 def and the implicit VGPR_NN def and use). There is no way to express that using the currently available operand bits, and indeed there are cases where the Two Address instructions pass does the wrong thing. This patch introduces a new set of pseudo instructions that are identical in intended semantics as v_movreld, but they only have two tied operands. Having to add a new set of pseudo instructions is admittedly annoying, but it's a fairly straightforward and solid approach. The only alternative I see is to try to teach the Two Address instructions pass about Three Address instructions, and I'm afraid that's trickier and is going to end up more fragile. Note that v_movrels does not suffer from this problem, and so this patch does not touch it. This fixes several GL45-CTS.shaders.indexing.* tests. Reviewers: tstellarAMD, arsenm Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D25633 llvm-svn: 284980	2016-10-24 14:56:02 +00:00
Pavel Labath	0a4bdea033	Remove unused #includes of TimeValue.h. NFC. llvm-svn: 284975	2016-10-24 14:00:26 +00:00
Joel Jones	704e107457	AArch64 ILP32 relocations for assembly and ELF Summary: Add relocations for AArch64 ILP32. Includes: - Addition of definitions for R_AARCH32_* - Definition of new -target-abi: ilp32 - Definition of data layout string - Tests for added relocations. Not comprehensive, but matches existing tests for 64-bit. Renames "CHECK-OBJ" to "CHECK-OBJ-LP64". - Tests for llvm-readobj Reviewers: zatrazz, peter.smith, echristo, t.p.northover Subscribers: aemerson, rengolin, mehdi_amini Differential Revision: https://reviews.llvm.org/D25159 llvm-svn: 284973	2016-10-24 13:37:13 +00:00
Krzysztof Parzyszek	633d6203c9	[RDF] Add default move constructors/assignment operators llvm-svn: 284972	2016-10-24 13:15:20 +00:00
Simon Dardis	6bd27cb7b1	[mips] synci microMIPS instruction definition. Add synci to the microMIPS instruction definitions, mark the MIPS sync & synci as not being part of microMIPS. This does not cover the sync instruction alias, as that will be handled with a different patch. Add sync to the valid tests for microMIPS. Reviewers: vkalintiris Differential Revision: https://reviews.llvm.org/D25795 llvm-svn: 284962	2016-10-24 10:23:59 +00:00
Craig Topper	9f7eaa6bb6	[AVX-512] Remove masked pmin/pmax intrinsics and autoupgrade to native IR. Clang patch to replace 512-bit vector and 64-bit element versions with native IR will follow. llvm-svn: 284955	2016-10-24 04:04:16 +00:00
Simon Pilgrim	9a5c4b864b	[X86][SSE] Add SSE41/AVX1 costs for vector shifts. We were defaulting to SSE2 costs which weren't taking into account the availability of PBLENDW/PBLENDVB to improve merging of per-element shift results. llvm-svn: 284939	2016-10-23 16:49:04 +00:00
Simon Pilgrim	7070280c5e	Use APInt::isAllOnesValue instead of popcnt. NFCI. More obvious implementation and faster too. llvm-svn: 284937	2016-10-23 15:09:44 +00:00
Dylan McKay	872c3b9ed5	[AVR] Add the machine code disassembler This adds a super basic implementation of a machine code disassembler. It doesn't support any operands with custom encoding. llvm-svn: 284930	2016-10-22 23:57:59 +00:00
Simon Pilgrim	3e08d68407	[X86][AVX512VL] Added support for combining target 256-bit shuffles to AVX512VL VPERMV3 llvm-svn: 284922	2016-10-22 20:15:39 +00:00
Simon Pilgrim	d94b9f5253	[X86][AVX512] Added support for combining target shuffles to AVX512 VPERMV3 llvm-svn: 284921	2016-10-22 19:53:59 +00:00
James Molloy	3bb4aed2c3	[ARM] Fix crash in ConstantIslands tPCRelJT may not be the first instruction in a block. Check that instead of dereferencing a broken iterator. llvm-svn: 284917	2016-10-22 09:58:37 +00:00
Craig Topper	739246c183	[X86] Add support for printing shuffle comments for VALIGN instructions. llvm-svn: 284915	2016-10-22 06:51:56 +00:00
Craig Topper	853b5fdcbe	[X86] Add support for lowering v4i64 and v8i64 shuffles directly to PALIGNR. I think shuffle combine can figure it out later, but we should try to get it right up front. llvm-svn: 284914	2016-10-22 06:51:52 +00:00
Craig Topper	fa3624e13b	[X86] Remove unnecessary AVX2 check that was already covered by an assertion earlier in the function. NFC llvm-svn: 284913	2016-10-22 06:51:49 +00:00
Craig Topper	541a27e98b	[X86] Remove 128-bit lane handling from the main loop of matchVectorShuffleAsByteRotate. Instead check for is128LaneRepeatedSuffleMask before the loop and just loop over the repeated mask. I plan to use the loop to support VALIGND/Q shuffles so this makes it easier to reuse. llvm-svn: 284912	2016-10-22 06:51:44 +00:00
Simon Pilgrim	489bd19900	[X86][SSE] Use getConstVector helper for VPERMV mask generation. NFCI. llvm-svn: 284911	2016-10-22 06:18:36 +00:00
Konstantin Zhuravlyov	71a0076eb4	[AMDGPU] Perform uchar to float combine for ISD::SINT_TO_FP This will prevent following regression when enabling i16 support (D18049): test/CodeGen/AMDGPU/cvt_f32_ubyte.ll Differential Revision: https://reviews.llvm.org/D25805 llvm-svn: 284891	2016-10-21 22:10:03 +00:00
Tom Stellard	bbc3abcc55	AMDGPU/SI: Fix crash caused by r284267 Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D25782 llvm-svn: 284875	2016-10-21 20:25:11 +00:00
Peter Collingbourne	1d5e23a582	X86: Improve BT instruction selection for 64-bit values. If a 64-bit value is tested against a bit which is known to be in the range [0..31) (modulo 64), we can use the 32-bit BT instruction, which has a slightly shorter encoding. Differential Revision: https://reviews.llvm.org/D25862 llvm-svn: 284864	2016-10-21 19:57:55 +00:00
Simon Pilgrim	7f8a590289	[X86][AVX512BWVL] Added support for lowering v16i16 shuffles to AVX512BWVL vpermw llvm-svn: 284863	2016-10-21 19:54:38 +00:00
Simon Pilgrim	dc55ada824	[X86][AVX512BWVL] Added support for combining target v16i16 shuffles to AVX512BWVL vpermw llvm-svn: 284860	2016-10-21 19:40:29 +00:00
Simon Pilgrim	dd8d6ffe96	[X86][AVX512] Added support for combining target shuffles to AVX512 vpermpd/vpermq/vpermps/vpermd/vpermw llvm-svn: 284858	2016-10-21 19:18:09 +00:00
Krzysztof Parzyszek	d6f2769360	[RDF] Use RegisterId typedef more consistently, NFC llvm-svn: 284857	2016-10-21 19:12:13 +00:00
Krzysztof Parzyszek	18cf3c16d9	[Hexagon] Handle spills of partially defined double vector registers After register allocation it is possible to have a spill of a register that is only partially defined. That in itself it fine, but creates a problem for double vector registers. Stores of such registers are pseudo instructions that are expanded into pairs of individual vector stores, and in case of a partially defined source, one of the stores may use an entirely undefined register. To avoid this, track the defined parts and only generate actual stores for those. llvm-svn: 284841	2016-10-21 16:38:29 +00:00
Derek Schuff	8f7a717a1c	[WebAssembly] Fix for 0xc call_indirect changes Summary: Need to reorder the operands to have the callee as the last argument. Adds a pseudo-instruction, and a pass to lower it into a real call_indirect. This is the first of two options for how to fix the problem. Reviewers: dschuff, sunfish Subscribers: jfb, beanz, mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D25708 llvm-svn: 284840	2016-10-21 16:38:07 +00:00
Abderrazek Zaafrani	6a092a71c6	Set the vectorizer MaxInterleaveFactor for Exynos. llvm-svn: 284839	2016-10-21 16:28:27 +00:00
Simon Pilgrim	f5c4b15e16	[X86] Use DAG::getBuildVector helper wrapper where possible. NFCI. llvm-svn: 284835	2016-10-21 16:07:51 +00:00
Abderrazek Zaafrani	dd8173f5ff	Test commit llvm-svn: 284832	2016-10-21 15:24:08 +00:00
Artem Tamazov	ce7f050fe8	[AMDGPU][mc] Fix ds_min/max[_rtn]_f32 - extra source operand removed. Fixes Bug 28215. Lit tests updated. Differential Revision: https://reviews.llvm.org/D25837 llvm-svn: 284825	2016-10-21 14:49:22 +00:00
Simon Pilgrim	da9223b586	[X86][AVX2] Begun generalizing lowering to VPERMD/VPERMPS in preparation for AVX512 support. llvm-svn: 284823	2016-10-21 13:00:47 +00:00
Simon Pilgrim	eded0974e0	[X86][AVX512] Add mask/maskz writemask support to subvector broadcast shuffle decode comments llvm-svn: 284821	2016-10-21 12:14:24 +00:00
Bjorn Pettersson	1d059f723a	[AArch64] Corrected spill size for DDD register class. NFCI Summary: The spill size was incorrectly set to 196 bits, which isn't a multiple of 8. This problem was detected when experimenting with asserts that the spill size should be a multiple of the byte size. New corrected value for the spill size is set to 192 bits. Note that tablegen (RegisterInfoEmitter) will divide the size set in the RegisterClass definition by 8. So this change should not have any impact on the tablegen output (trunc(192/8) == trunc(196/8) == 24 bytes). Reviewers: t.p.northover Subscribers: llvm-commits, aemerson, rengolin Differential Revision: https://reviews.llvm.org/D25818 llvm-svn: 284814	2016-10-21 09:53:42 +00:00
Michael Kuperstein	ae8887d98b	[X86] Enable interleaved memory access by default This lets the loop vectorizer generate interleaved memory accesses on x86. Differential Revision: https://reviews.llvm.org/D25350 llvm-svn: 284779	2016-10-20 21:04:31 +00:00
Konstantin Zhuravlyov	e9c9763529	[AMDGPU] Make note record name a static const member of target streamer Differential Revision: https://reviews.llvm.org/D25746 llvm-svn: 284760	2016-10-20 18:22:36 +00:00
Konstantin Zhuravlyov	8b4a3ec89d	[AMDGPU] Emit constant address space data in .rodata section and use relocations instead of fixups (amdhsa only) Differential Revision: https://reviews.llvm.org/D25693 llvm-svn: 284759	2016-10-20 18:12:38 +00:00
Simon Pilgrim	6773ac2510	[CostModel][X86] Fixed AVX1/AVX512 sdiv/udiv uniformconst costs for 256/512 bit integer vectors We weren't checking for uniform const costs before the general cost, resulting in very high estimates. llvm-svn: 284755	2016-10-20 18:00:35 +00:00
Sanjay Patel	0e75d706d6	[Target] remove TargetRecip class; 2nd try This is a retry of r284495 which was reverted at r284513 due to use-after-scope bugs caused by faulty usage of StringRef. This version also renames a pair of functions: getRecipEstimateDivEnabled() getRecipEstimateSqrtEnabled() as suggested by Eric Christopher. original commit msg: [Target] remove TargetRecip class; move reciprocal estimate isel functionality to TargetLowering This is a follow-up to https://reviews.llvm.org/D24816 - where we changed reciprocal estimates to be function attributes rather than TargetOptions. This patch is intended to be a structural, but not functional change. By moving all of the TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate state, shield the callers from the string format implementation, and simplify/localize the logic needed for a target to enable this. If a function has a "reciprocal-estimates" attribute, those settings may override the target's default reciprocal preferences for whatever operation and data type we're trying to optimize. If there's no attribute string or specific setting for the op/type pair, just use the target default settings. As noted earlier, a better solution would be to move the reciprocal estimate settings to IR instructions and SDNodes rather than function attributes, but that's a multi-step job that requires infrastructure improvements. I intend to work on that, but it's not clear how long it will take to get all the pieces in place. Differential Revision: https://reviews.llvm.org/D25440 llvm-svn: 284746	2016-10-20 16:55:45 +00:00
Simon Pilgrim	9091f77856	[CostModel][X86] Fixed AVX1/AVX512 sdiv/udiv general costs for 256/512 bit integer vectors We weren't accounting for legal types on every subtarget, meaning that many of the costs were using defaults. We still don't correctly cost (or test) the 512-bit sdiv/udiv by uniform const cases, nor the power-of-2 cases. llvm-svn: 284744	2016-10-20 16:39:11 +00:00
Valery Pykhtin	f531d627ac	[AMDGPU] add fcopysign(f64, f32) pattern Differential revision: https://reviews.llvm.org/D25827 llvm-svn: 284743	2016-10-20 16:17:54 +00:00
Benjamin Kramer	c2de5980d3	Do a sweep over move ctors and remove those that are identical to the default. All of these existed because MSVC 2013 was unable to synthesize default move ctors. We recently dropped support for it so all that error-prone boilerplate can go. No functionality change intended. llvm-svn: 284721	2016-10-20 12:20:28 +00:00
Jonas Paulsson	f8cb784f68	[SystemZ] Post-RA scheduler implementation Post-RA sched strategy and scheduling instruction annotations for z196, zEC12 and z13. This scheduler optimizes decoder grouping and balances processor resources (including side steering the FPd unit instructions). The SystemZHazardRecognizer keeps track of the scheduling state, which can be dumped with -debug-only=misched. Reviers: Ulrich Weigand, Andrew Trick. https://reviews.llvm.org/D17260 llvm-svn: 284704	2016-10-20 08:27:16 +00:00
Peter Collingbourne	a36f4c07c7	X86: Allow expressions to appear as u8imm operands. llvm-svn: 284688	2016-10-20 01:58:34 +00:00
Peter Collingbourne	355597a200	X86: Deduplicate some lowering code. NFCI. llvm-svn: 284686	2016-10-20 01:21:26 +00:00
Reid Kleckner	f362f03222	Use __func__ directly now that all supported compilers support it Remove the portability macro now that it is unused. llvm-svn: 284681	2016-10-20 00:22:23 +00:00
Wei Ding	dba32d5baf	AMDGPU : Add a function to enable and disable IEEEBit for SC and shader respectively. Differential Revision: http://reviews.llvm.org/D25789 llvm-svn: 284655	2016-10-19 22:34:49 +00:00
Krzysztof Parzyszek	f820888dd5	[AMDGPU] Stop using MCRegisterClass::getSize() Differential Review: https://reviews.llvm.org/D24675 llvm-svn: 284619	2016-10-19 17:40:36 +00:00
Krzysztof Parzyszek	f5fdbce361	[RDF] Switch RefMap in liveness calculation to use lane masks This required reengineering of some of the part of liveness calculation, including fixing some issues caused by the limitations of the previous approach. The current code is not necessarily the fastest, but it should be functionally correct (at least more so than before). The compile-time performance will be addressed in the future. llvm-svn: 284609	2016-10-19 16:30:56 +00:00
Chris Dewhurst	7e66f0654b	[Sparc][LEON] Detects an erratum on UT699 LEON 3 processors involving rounding mode changes and issues an appropriate user error message. Differential Revision: https://reviews.llvm.org/D24665 llvm-svn: 284591	2016-10-19 14:01:06 +00:00
Sjoerd Meijer	3d94a0d81d	Reapply r284571 (with the new tests fixed). llvm-svn: 284588	2016-10-19 13:43:02 +00:00
Ulrich Weigand	947360bd07	[SystemZ] Add missing vector instructions for the assembler Most z13 vector instructions have a base form where the data type of the operation (whether to consider the vector to be 16 bytes, 8 halfwords, 4 words, or 2 doublewords) is encoded into a mask field, and then a set of extended mnemonics where the mask field is not present but the data type is encoded into the mnemonic name. Currently, LLVM only supports the type-specific forms (since those are really the ones needed for code generation), but not the base type-generic forms. To complete the assembler support and make it fully compatible with the GNU assembler, this commit adds assembler aliases for all the base forms of the various vector instructions. It also adds two more alias forms that are documented in the PoP: VFPSO/VFPSODB/WFPSODB -- generic form of VFLCDB etc. VNOT -- special variant of VNO llvm-svn: 284586	2016-10-19 13:03:18 +00:00
Ulrich Weigand	bbbc1ef6a8	[SystemZ] Add optional argument to some vector string instructions The vfee[bhf], vfene[bhf], and vistr[bhf] assembler mnemonics are documented in the Principles of Operation to have an optional last operand to encode arbitrary values in a mask field. This commit adds support for those optional operands, and cleans up the patterns to generate vector string instruction as bit. No change to code generation intended. llvm-svn: 284585	2016-10-19 12:57:46 +00:00
James Molloy	5a58c95fb7	[Thumb-1] Synthesize TBB/TBH instructions to make use of compressed jump tables The TBB and TBH instructions in Thumb-2 allow jump tables to be compressed into sequences of bytes or shorts respectively. These instructions do not exist in Thumb-1, however it is possible to synthesize them out of a sequence of other instructions. It turns out this sequence is so short that it's almost never a lose for performance and is ALWAYS a significant win for code size. TBB example: Before: lsls r0, r0, #2 After: add r0, pc adr r1, .LJTI0_0 ldrb r0, [r0, #6] ldr r0, [r0, r1] lsls r0, r0, #1 mov pc, r0 add pc, r0 => No change in prologue code size or dynamic instruction count. Jump table shrunk by a factor of 4. The only case that can increase dynamic instruction count is the TBH case: Before: lsls r0, r4, #2 After: lsls r4, r4, #1 adr r1, .LJTI0_0 add r4, pc ldr r0, [r0, r1] ldrh r4, [r4, #6] mov pc, r0 lsls r4, r4, #1 add pc, r4 => 1 more instruction in prologue. Jump table shrunk by a factor of 2. So there is an argument that this should be disabled when optimizing for performance (and a TBH needs to be generated). I'm not so sure about that in practice, because on small cores with Thumb-1 performance is often tied to code size. But I'm willing to turn it off when optimizing for performance if people want (also note that TBHs are fairly rare in practice!) llvm-svn: 284580	2016-10-19 12:06:49 +00:00
Sjoerd Meijer	6540326539	Revert of r284571 because of failing tests. llvm-svn: 284572	2016-10-19 07:45:48 +00:00
Sjoerd Meijer	522c2a6a6a	Checking FP function attribute values and adding more build attribute tests. This renames the function for checking FP function attribute values and also adds more build attribute tests (which are in separate files because build attributes are set per file). Differential Revision: https://reviews.llvm.org/D25625 llvm-svn: 284571	2016-10-19 07:25:06 +00:00
Craig Topper	7301554d8d	[AVX-512] Teach isel lowering that a subvector broadcast being inserted into both halves of a 512-bit vector can be combined into a larger subvector broadcast. Summary: This allows us to create broadcasts of 128-bit vector loads into 512-bit vectors. New patterns added to support 8-bit and 16-bit vector types and v2f64/v2i64->v8f64/v8i64 without DQI instructions. There also fallback patterns when the load can't be folded. These patterns are a little complex as we first need to insert the lower 128-bits into the second 128-bits using a zmm subvector insert instruction. We need to use a zmm insert in case VLX isn't available. Then use another zmm sub vector insert to take those 256-bits and insert them into the upper bits. Since we used a zmm insert to create the 256-bits we also need to do a extract_subreg to get just the lower 256-bits to pass to the second insert. The outer insert for the fallback patterns should have its type correct because eventually we should also supported masked operations here too. So we need a DQI and a NoDQI version of the v16f32/v16i32 patterns. Reviewers: RKSimon, delena, igorb Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25651 llvm-svn: 284567	2016-10-19 04:44:17 +00:00
Eli Friedman	a5414b4036	Improve ARM lowering for "icmp <2 x i64> eq". The custom lowering is pretty straightforward: basically, just AND together the two halves of a <4 x i32> compare. Differential Revision: https://reviews.llvm.org/D25713 llvm-svn: 284536	2016-10-18 21:03:40 +00:00
Evandro Menezes	39d7a5132d	[AArch64] Avoid materializing 0.0 when generating FP SELECT Transform `a == 0.0 ? 0.0 : x` to `a == 0.0 ? a : x` and `a != 0.0 ? x : 0.0` to `a != 0.0 ? x : a` to avoid materializing 0.0 for FCSEL, since it does not have to be materialized beforehand for FCMP, as it has a form that has 0.0 as an implicit operand. Differential Revision: https://reviews.llvm.org/D24808 llvm-svn: 284531	2016-10-18 20:37:35 +00:00
Tim Northover	49c73af459	GlobalISel: select small binary operations on AArch64. AArch64 actually supports many 8-bit operations under the definition used by GlobalISel: the designated information-carrying bits of a GPR32 get the right value if you just use the normal 32-bit instruction. llvm-svn: 284526	2016-10-18 20:03:48 +00:00
Tim Northover	189324351a	GlobalISel: support floating-point constants on AArch64. Patch from Ahmed Bougacha. llvm-svn: 284523	2016-10-18 19:47:57 +00:00
Krzysztof Parzyszek	7adf6be710	[Hexagon] Handle block live-ins with lane masks in HexagonBlockRanges llvm-svn: 284522	2016-10-18 19:47:20 +00:00
Benjamin Kramer	641730af4a	Reduce global namespace pollution. NFC. llvm-svn: 284521	2016-10-18 19:39:31 +00:00
Sanjay Patel	f5a046ca77	revert r284495: [Target] remove TargetRecip class There's something wrong with the StringRef usage while parsing the attribute string. llvm-svn: 284513	2016-10-18 18:36:49 +00:00
Sanjay Patel	399c026728	[Target] remove TargetRecip class; move reciprocal estimate isel functionality to TargetLowering This is a follow-up to D24816 - where we changed reciprocal estimates to be function attributes rather than TargetOptions. This patch is intended to be a structural, but not functional change. By moving all of the TargetRecip functionality into TargetLowering, we can remove all of the reciprocal estimate state, shield the callers from the string format implementation, and simplify/localize the logic needed for a target to enable this. If a function has a "reciprocal-estimates" attribute, those settings may override the target's default reciprocal preferences for whatever operation and data type we're trying to optimize. If there's no attribute string or specific setting for the op/type pair, just use the target default settings. As noted earlier, a better solution would be to move the reciprocal estimate settings to IR instructions and SDNodes rather than function attributes, but that's a multi-step job that requires infrastructure improvements. I intend to work on that, but it's not clear how long it will take to get all the pieces in place. Differential Revision: https://reviews.llvm.org/D25440 llvm-svn: 284495	2016-10-18 17:05:05 +00:00
Simon Pilgrim	f7faf934bc	[X86][AVX512] Add mask/maskz writemask support to constant pool shuffle decode commentx llvm-svn: 284488	2016-10-18 15:45:37 +00:00
Simon Dardis	c18e1a3a86	[mips][ias] Handle more complicated expressions for memory operands This patch teaches ias for mips to handle expressions such as (84)+(831)($sp). Such expression typically occur from the expansion of multiple macro definitions. This partially resolves PR/30383. Thanks to Sean Bruno for reporting the issue! Reviewers: zoran.jovanovic, vkalintiris Differential Revision: https://reviews.llvm.org/D24667 llvm-svn: 284485	2016-10-18 15:17:17 +00:00
Simon Dardis	854890c453	[mips] Fix sync instruction definition The 'sync' instruction for MIPS was defined in MIPS-II as taking no operands. MIPS32 extended the define of 'sync' as taking an optional unsigned 5 bit immediate. This patch correct the definition of sync so that it is accepted with an operand of 0 or no operand for MIPS-II to MIPS-V, and a 5 bit unsigned immediate for MIPS32 and later revisions. Additionally a clear error is given when the MIPS32 version of sync is used when targeting pre MIPS32. This partially resolves PR/30714. Thanks to Daniel Sanders for reporting this issue! Reveiwers: vkalintiris Differential Revision: https://reviews.llvm.org/D25672 llvm-svn: 284483	2016-10-18 14:42:13 +00:00
Simon Dardis	79593aa13b	[mips] Macro expansion for ld, sd for O32 ld and sd when assembled for the O32 ABI expand to a pair of 32 bit word loads or stores using the specified source or destination register and the next register. This patch does not add support for the cases where the offset is greater than a 16 bit signed immediate as that would lead to a wrong/misleading error message as the assembler would report "instruction requires a CPU feature not currently enabled" for ld & sd for MIPS64 when their offset is not a signed 16 bit number. This fixes PR/29159. Thanks to Sean Bruno for reporting this issue! Reviewers: vkalintiris, seanbruno, zoran.jovanovic Differential Review: https://reviews.llvm.org/D24556 llvm-svn: 284481	2016-10-18 14:28:00 +00:00
Michael Zuckerman	89d464b0ab	[x86][inline-asm][avx512] allow swapping of '{k<num>}' & '{z}' marks Committing on behalf of Coby Tayree: After check-all and LGTM Desc: AVX512 allows dest operand to be followed by an op-mask register specifier ('{k<num>}', which in turn may be followed by a merging/zeroing specifier ('{z}') Currently, the following forms are allowed: {k<num>} {k<num>}{z} This patch allows the following forms: {z}{k<num>} and ignores the next form: {z} Justification would be quite simple - GCC Differential Revision: http://reviews.llvm.org/D25013 llvm-svn: 284479	2016-10-18 13:52:39 +00:00
Vasileios Kalintiris	083354fdfb	[mips][FastISel] Instantiate the MipsFastISel class only for targets that support FastISel. Summary: Instead of instantiating the MipsFastISel class and checking if the target is supported in the overriden methods, we should perform that check before creating the class. This allows us to enable FastISel only for targets that truly support it, ie. MIPS32 to MIPS32R5. Reviewers: sdardis Subscribers: ehostunreach, llvm-commits Differential Revision: https://reviews.llvm.org/D24824 llvm-svn: 284475	2016-10-18 13:05:42 +00:00
Javed Absar	90db73a094	[ARM] Assign cost of scaling for Cortex-R52 This patch assigns cost of the scaling used in addressing for Cortex-R52. On Cortex-R52 a negated register offset takes longer than a non-negated register offset, in a register-offset addressing mode. Differential Revision: http://reviews.llvm.org/D25670 Reviewer: jmolloy llvm-svn: 284460	2016-10-18 09:08:54 +00:00
Simon Pilgrim	293c1a3deb	[X86][SSE] Add lowering to cvttpd2dq/cvttps2dq for sitofp v2f64/2f32 to 2i32 As discussed on PR28461 we currently miss the chance to lower "fptosi <2 x double> %arg to <2 x i32>" to cvttpd2dq due to its use of illegal types. This patch adds support for fptosi to 2i32 from both 2f64 and 2f32. It also recognises that cvttpd2dq zeroes the upper 64-bits of the xmm result (similar to D23797) - we still don't do this for the cvttpd2dq/cvttps2dq intrinsics - this can be done in a future patch. Differential Revision: https://reviews.llvm.org/D23808 llvm-svn: 284459	2016-10-18 07:42:15 +00:00
Dean Michael Berris	b135433036	[XRay] Support for for tail calls for ARM no-Thumb This patch adds simplified support for tail calls on ARM with XRay instrumentation. Known issue: compiled with generic flags: `-O3 -g -fxray-instrument -Wall -std=c++14 -ffunction-sections -fdata-sections` (this list doesn't include my specific flags like --target=armv7-linux-gnueabihf etc.), the following program #include <cstdio> #include <cassert> #include <xray/xray_interface.h> [[clang::xray_always_instrument]] void __attribute__ ((noinline)) fC() { std::printf("In fC()\n"); } [[clang::xray_always_instrument]] void __attribute__ ((noinline)) fB() { std::printf("In fB()\n"); fC(); } [[clang::xray_always_instrument]] void __attribute__ ((noinline)) fA() { std::printf("In fA()\n"); fB(); } // Avoid infinite recursion in case the logging function is instrumented (so calls logging // function again). [[clang::xray_never_instrument]] void simplyPrint(int32_t functionId, XRayEntryType xret) { printf("XRay: functionId=%d type=%d.\n", int(functionId), int(xret)); } int main(int argc, char* argv[]) { __xray_set_handler(simplyPrint); printf("Patching...\n"); __xray_patch(); fA(); printf("Unpatching...\n"); __xray_unpatch(); fA(); return 0; } gives the following output: Patching... XRay: functionId=3 type=0. In fA() XRay: functionId=3 type=1. XRay: functionId=2 type=0. In fB() XRay: functionId=2 type=1. XRay: functionId=1 type=0. XRay: functionId=1 type=1. In fC() Unpatching... In fA() In fB() In fC() So for function fC() the exit sled seems to be called too much before function exit: before printing In fC(). Debugging shows that the above happens because printf from fC is also called as a tail call. So first the exit sled of fC is executed, and only then printf is jumped into. So it seems we can't do anything about this with the current approach (i.e. within the simplification described in https://reviews.llvm.org/D23988 ). Differential Revision: https://reviews.llvm.org/D25030 llvm-svn: 284456	2016-10-18 05:54:15 +00:00
Craig Topper	f805b093b4	[X86] Fix DecodeVPERMVMask to handle cases where the constant pool entry has a different type than the shuffle itself. This is especially important for 32-bit targets with 64-bit shuffle elements. llvm-svn: 284453	2016-10-18 04:48:33 +00:00
Craig Topper	a71d800155	[AVX-512] Fix DecodeVPERMV3Mask to handle cases where the constant pool entry has a different type than the shuffle itself. Summary: This is especially important for 32-bit targets with 64-bit shuffle elements.This is similar to how PSHUFB and VPERMIL handle the same problem. Reviewers: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25666 llvm-svn: 284451	2016-10-18 04:00:32 +00:00
Craig Topper	baaf1b7511	[AVX-512] Add support for decoding shuffle mask from constant pool for masked VPERMILPS/PD. llvm-svn: 284450	2016-10-18 03:36:52 +00:00
Konstantin Zhuravlyov	f1c9f3143b	[AMDGPU] Mark .note section SHF_ALLOC so lld creates a segment for it Differential Revision: https://reviews.llvm.org/D25694 llvm-svn: 284435	2016-10-17 22:40:15 +00:00
Tim Northover	2bc7209c52	GlobalISel: support wider range of load/store sizes in AArch64. llvm-svn: 284406	2016-10-17 18:36:53 +00:00
Tom Stellard	e57c681680	AMDGPU/SI: LowerParameter() should be computing align based on memory type Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D25203 llvm-svn: 284398	2016-10-17 16:56:19 +00:00
Tom Stellard	a4414c957a	AMDGPU/SI: Fix LowerParameter() for i16 arguments Summary: If we are loading an i16 value from a 32-bit memory location, then we need to be able to truncate the loaded value to i16. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D25198 llvm-svn: 284397	2016-10-17 16:21:45 +00:00
Craig Topper	4538af9fb0	[X86] Fix shuffle decoding assertions to print the right number of required operands. Update the checks themselves to be >= to the same number instead of > one less than the required number. llvm-svn: 284365	2016-10-17 06:41:18 +00:00
Craig Topper	0b7e675d9c	[AVX-512] Add shuffle combining support for vpermi2var shuffles derived from existing support for vpermt2var. llvm-svn: 284357	2016-10-17 04:26:47 +00:00
Craig Topper	6621b37696	[AVX-512] Add support for turning a 256-bit load that goes to both halfs of an insert_subvector into a subvector broadcast. Differential Revision: https://reviews.llvm.org/D25650 llvm-svn: 284353	2016-10-16 23:29:51 +00:00
Craig Topper	de787f6ced	[AVX-512] Fix the operand order for vpermi2var_qi intrinsics to match the other vpermi2var intrinsics. llvm-svn: 284329	2016-10-16 04:54:35 +00:00
Craig Topper	fc3bffdd20	[AVX-512] Correct execution domain for VPERMT2PS and VPERMI2PS. llvm-svn: 284328	2016-10-16 04:54:31 +00:00
Craig Topper	a6d8c5e30e	[AVX-512] Move (v4i64 (X86SubVBroadcast (v2i64))) alternate patterns under a HasVLX predicate. Similar for floating point. llvm-svn: 284327	2016-10-16 04:54:26 +00:00
Davide Italiano	a92b11f81e	[ArmFastISel] Kill dead code. NFCI. llvm-svn: 284320	2016-10-16 01:09:39 +00:00
Konstantin Zhuravlyov	1aefc15a1a	[MachineMemOperand] Move synchronization scope and atomic orderings from SDNode to MachineMemOperand, and remove redundant getAtomic* member functions from SelectionDAG. Differential Revision: https://reviews.llvm.org/D24577 llvm-svn: 284312	2016-10-15 22:01:18 +00:00
Craig Topper	865d3464cd	[AVX-512] Add shuffle comments for vbroadcast instructions. llvm-svn: 284305	2016-10-15 16:26:07 +00:00
Craig Topper	86fd8363e9	[AVX-512] Rename VPBROADCASTI32X2 and VPBROADCASTF32X2 instruction classes to match the mnemonic which does not include a 'P'. llvm-svn: 284304	2016-10-15 16:26:02 +00:00
Tom Stellard	eeebdabf5d	AMDGPU/SI: Handle s_getreg hazard in GCNHazardRecognizer Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D25526 llvm-svn: 284298	2016-10-15 00:58:14 +00:00
Tim Northover	dc91ae935f	GlobalISel: rename legalizer components to match others. The previous names were both misleading (the MachineLegalizer actually contained the info tables) and inconsistent with the selector & translator (in having a "Machine") prefix. This should make everything sensible again. The only functional change is the name of a couple of command-line options. llvm-svn: 284287	2016-10-14 22:18:18 +00:00
Guozhi Wei	19888a3d98	[PPC] Shorter sequence to load 64bit constant with same hi/lo words This is a patch to implement pr30640. When a 64bit constant has the same hi/lo words, we can use rldimi to copy the low word into high word of the same register. This optimization caused failure of test case bperm.ll because of not optimal heuristic in function SelectAndParts64. It chooses AND or ROTATE to extract bit groups from a register, and OR them together. This optimization lowers the cost of loading 64bit constant mask used in AND method, and causes different code sequence. But actually ROTATE method is better in this test case. The reason is in ROTATE method the final OR operation can be avoided since rldimi can insert the rotated bits into target register directly. So this patch also enhances SelectAndParts64 to prefer ROTATE method when the two methods have same cost and there are multiple bit groups need to be ORed together. Differential Revision: https://reviews.llvm.org/D25521 llvm-svn: 284276	2016-10-14 20:41:50 +00:00

... 2 3 4 5 6 ...

40090 Commits