llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-20 19:42:54 +02:00

Author	SHA1	Message	Date
Simon Pilgrim	2b08de462d	[X86] combineVSelectWithAllOnesOrZeros - cleanup variable names. NFCI. We were reusing the 'false' select value 'is zero' variable name for the 'true' select value 'is zero' variable name. llvm-svn: 313528	2017-09-18 12:55:54 +00:00
Nikolai Bozhenov	59948ba101	[X86FixupBWInsts] More precise register liveness if no <imp-use> on MOVs. Summary: Subregister liveness tracking is not implemented for X86 backend, so sometimes the whole super register is said to be live, when only a subregister is really live. That might happen if the def and the use are located in different MBBs, see added fixup-bw-isnt.mir test. However, using knowledge of the specific instructions handled by the bw-fixup-pass we can get more precise liveness information which this change does. Reviewers: MatzeB, DavidKreitzer, ab, andrew.w.kaylor, craig.topper Reviewed By: craig.topper Subscribers: n.bozhenov, myatsina, llvm-commits, hiraditya Patch by Andrei Elovikov <andrei.elovikov@intel.com> Differential Revision: https://reviews.llvm.org/D37559 llvm-svn: 313524	2017-09-18 10:17:59 +00:00
Craig Topper	e285fd6929	[X86] Strengthen some of the SD type constraints in X86InstrFragmentsSIMD.td This effects the vector shift and rotates as well as some of the vector compares. The changes to the shifts by immediates allows a few hundred bytes to be removed by removing type checks for the size of the immediate containing the shift/rotate amount. llvm-svn: 313512	2017-09-18 05:50:54 +00:00
Craig Topper	2597a3e267	[X86] Teach the execution domain fixing tables to use movlhps inplace of unpcklpd for the packed single domain. MOVLHPS has a smaller encoding than UNPCKLPD in the legacy encodings. With VEX and EVEX encodings it doesn't matter. llvm-svn: 313509	2017-09-18 04:40:58 +00:00
Craig Topper	324ef34591	[X86] Teach execution domain fixing to convert between FP and int unpack instructions. llvm-svn: 313508	2017-09-18 03:29:54 +00:00
Craig Topper	2e6c540c21	[X86] Teach execution domain fixing to convert between VPERMILPS and VPSHUFD. llvm-svn: 313507	2017-09-18 03:29:47 +00:00
Craig Topper	5cd79a5b17	[X86] Remove the X86ISD::MOVLHPD. Lowering doesn't use it and it's not a real instruction. It was used in patterns, but we had the exact same patterns with Unpckl as well. So now just use Unpckl in the instruction patterns. llvm-svn: 313506	2017-09-18 00:20:53 +00:00
Craig Topper	7d985533eb	[X86] Teach shuffle lowering to use MOVLHPS/MOVHLPS for lowering v4f32 unary shuffles with SSE1 only. llvm-svn: 313504	2017-09-17 22:36:41 +00:00
Craig Topper	c4365274b5	[X86] Synchronize a pattern between SSE1 and AVX/AVX512. For some reason the SSE1 pattern expected a X86Movlhps pattern to have a v4f32 type, but AVX and AVX512 expected it to have a v4i32 type. I'm not even sure this pattern is even reachable post SSE1, but I'm starting with fixing this obvious bug. llvm-svn: 313495	2017-09-17 18:59:32 +00:00
Craig Topper	0683698331	[X86] Colocate all of the X86VBroadcast patterns for v2i64 and v2f64. NFC The memory patterns were near the MOVDDUP definition, but the non-memory patterns were near the broadcast instructions. llvm-svn: 313494	2017-09-17 18:59:30 +00:00
Craig Topper	eff0aa5cbe	[X86] Remove patterns for X86Movddup with v4i64 type. Lowering doesn't emit these. llvm-svn: 313493	2017-09-17 18:59:28 +00:00
Craig Topper	50e9e8ffcb	[X86] Remove isel patterns for X86Movhlps and X86Movlhps with integer types. Lowering doesn't emit these. llvm-svn: 313492	2017-09-17 18:59:26 +00:00
Craig Topper	18143e1869	[X86] Remove isel patterns for movlpd/movlps with integer types. Lowering doesn't emit these. llvm-svn: 313491	2017-09-17 18:59:24 +00:00
Alex Bradbury	2e19dd3e02	[RISCV] Add support for disassembly This Disassembly support allows for 'round-trip' testing, and rv32i-valid.s has been updated appropriately. Differential Revision: https://reviews.llvm.org/D23567 llvm-svn: 313486	2017-09-17 14:36:28 +00:00
Alex Bradbury	6e94e164a1	[RISCV] Add support for all RV32I instructions This patch supports all RV32I instructions as described in the RISC-V manual. A future patch will add support for pseudoinstructions and other instruction expansions (e.g. 0-arg fence -> fence iorw, iorw). Differential Revision: https://reviews.llvm.org/D23566 llvm-svn: 313485	2017-09-17 14:27:35 +00:00
Igor Breger	8308dd9d01	[GlobalISel][X86] refactoring X86InstructionSelector.cpp .NFC. llvm-svn: 313484	2017-09-17 14:02:19 +00:00
Igor Breger	134168987a	[GlobalISel][X86] Legalize i1 G_ADD/G_SUB/G_MUL/G_XOR/G_OR/G_AND instructions. llvm-svn: 313483	2017-09-17 11:34:17 +00:00
Igor Breger	f5aca70376	[GlobalISel][X86] G_FCONSTANT support. Summary: G_FCONSTANT support, port the implementation from X86FastIsel. Reviewers: zvi, delena, guyblank Reviewed By: delena Subscribers: rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D37734 llvm-svn: 313478	2017-09-17 08:08:13 +00:00
Craig Topper	c9df7024c7	[X86] Remove integer X86ISD::SHUFP patterns. Lowering doesn't emit these. llvm-svn: 313477	2017-09-17 06:09:32 +00:00
Craig Topper	2a324415cd	[X86] Add patterns to make blends with immediate control commutable during isel for load folding. llvm-svn: 313476	2017-09-17 05:06:05 +00:00
Craig Topper	d1020c20b9	[X86] Remove some unused defaults from some multiclass parameters. llvm-svn: 313475	2017-09-17 05:06:03 +00:00
Craig Topper	16eca3ad0c	[X86] Make PLCMULQDQ instructions commutable during isel to fold loads. This adds new patterns and SDNodeXForm to enable the immediate to commuted. llvm-svn: 313472	2017-09-16 23:18:50 +00:00
Craig Topper	dbb60756e2	[X86] Add NoAVX predicates to the patterns for the legacy encoded PCLMUL and AES instructions. Previously we were just relying on pattern order to define precedence. Which works, but isn't the best way. llvm-svn: 313471	2017-09-16 23:18:48 +00:00
Craig Topper	3ab295c547	[X86] Remove some extra code that snuck into r313450. The same code appears earlier in the function. This represents an earlier version of what became r313373 that I still had sitting in my local repo. llvm-svn: 313465	2017-09-16 17:51:55 +00:00
Sanjay Patel	62d8ec1120	[x86] enable storeOfVectorConstantIsCheap() target hook This allows vector-sized store merging of constants in DAGCombiner using the existing code in MergeConsecutiveStores(). All of the twisted logic that decides exactly what vector operations are legal and fast for each particular CPU are handled separately in there using the appropriate hooks. For the motivating tests in merge-store-constants.ll, we already produce the same vector code in IR via the SLP vectorizer. So this is just providing a backend backstop for code that doesn't go through that pass (-O1). More details in PR24449: https://bugs.llvm.org/show_bug.cgi?id=24449 (this change should be the last step to resolve that bug) Differential Revision: https://reviews.llvm.org/D37451 llvm-svn: 313458	2017-09-16 13:29:12 +00:00
Craig Topper	0b5789f3cc	[X86] Add isel patterns to be able to fold loads into VPERM2F128 even when the load is on the first input to the SDNode. We just need to toggle bits 1 and 5 of the immediate and swap the sources. The peephole pass could trigger commuting/folding for this later, but its easy enough to fix in isel. Disable the peephole pass on the main vperm2x128 test so we know we're doing this through isel. llvm-svn: 313455	2017-09-16 09:16:48 +00:00
Craig Topper	7542d6cd2e	[X86] Remove VPERM2X128 isel patterns with 32-bit elements. Now that the intrinsics are gone we only need 64-bit elements since that's what shuffle lowering uses. llvm-svn: 313453	2017-09-16 08:15:52 +00:00
Craig Topper	9f5737a6bd	[X86] Remove VPERM2F128/VPERM2I128 intrinsics and autoupgrade to native shuffles. I've moved the test cases from the InstCombine optimizations to the backend to keep the coverage we had there. It covered every possible immediate so I've preserved the resulting shuffle mask for each of those immediates. llvm-svn: 313450	2017-09-16 07:36:14 +00:00
Sam Clegg	980730befa	Change encodeU/SLEB128 to pad to certain number of bytes Previously the 'Padding' argument was the number of padding bytes to add. However most callers that use 'Padding' know how many overall bytes they need to write. With the previous code this would mean encoding the LEB once to find out how many bytes it would occupy and then using this to calulate the 'Padding' value. See: https://reviews.llvm.org/D36595 Differential Revision: https://reviews.llvm.org/D37494 llvm-svn: 313393	2017-09-15 20:34:47 +00:00
Mandeep Singh Grang	324ae49da1	[llvm] Fix some typos. NFC. Reviewers: mcrosier Reviewed By: mcrosier Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D37922 llvm-svn: 313388	2017-09-15 20:01:43 +00:00
Hans Wennborg	36d48161a2	Revert r313343 "[X86] PR32755 : Improvement in CodeGen instruction selection for LEAs." This caused PR34629: asserts firing when building Chromium. It also broke some buildbots building test-suite as reported on the commit thread. > Summary: > 1/ Operand folding during complex pattern matching for LEAs has been > extended, such that it promotes Scale to accommodate similar operand > appearing in the DAG. > e.g. > T1 = A + B > T2 = T1 + 10 > T3 = T2 + A > For above DAG rooted at T3, X86AddressMode will no look like > Base = B , Index = A , Scale = 2 , Disp = 10 > > 2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs > so that if there is an opportunity then complex LEAs (having 3 operands) > could be factored out. > e.g. > leal 1(%rax,%rcx,1), %rdx > leal 1(%rax,%rcx,2), %rcx > will be factored as following > leal 1(%rax,%rcx,1), %rdx > leal (%rdx,%rcx) , %edx > > 3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops, > thus avoiding creation of any complex LEAs within a loop. > > Reviewers: lsaba, RKSimon, craig.topper, qcolombet > > Reviewed By: lsaba > > Subscribers: spatel, igorb, llvm-commits > > Differential Revision: https://reviews.llvm.org/D35014 llvm-svn: 313376	2017-09-15 18:40:26 +00:00
Craig Topper	877a55b379	[X86] Prefer VPERMQ over VPERM2F128 for any unary shuffle, not just the ones that can be done with a insertf128 The early out for AVX2 in lowerV2X128VectorShuffle is positioned in a weird spot below some shuffle mask equivalency checks. But I think we want to allow VPERMQ for any unary shuffle. Differential Revision: https://reviews.llvm.org/D37893 llvm-svn: 313373	2017-09-15 18:11:13 +00:00
Craig Topper	a95407b7df	[X86] Use SDNode::ops() instead of makeArrayRef and op_begin(). NFCI llvm-svn: 313367	2017-09-15 17:09:05 +00:00
Craig Topper	82b1d47df9	[X86] Don't create i64 constants on 32-bit targets when lowering v64i1 constant build vectors When handling a v64i1 build vector of constants on 32-bit targets we were creating an illegal i64 constant that we then bitcasted back to v64i1. We need to instead create two 32-bit constants, bitcast them to v32i1 and concat the result. We should also take care to handle the halves being all zeros/ones after the split. This patch splits the build vector and then recursively lowers the two pieces. This allows us to handle the all ones and all zeros cases with minimal effort. Ideally we'd just do the split and concat, and let lowering get called again on the new nodes, but getNode has special handling for CONCAT_VECTORS that reassembles the pieces back into a single BUILD_VECTOR. Hopefully the two temporary BUILD_VECTORS we had to create to do this that don't get returned don't cause any issues. Fixes PR34605. Differential Revision: https://reviews.llvm.org/D37858 llvm-svn: 313366	2017-09-15 17:09:03 +00:00
Craig Topper	ddf0ac4f90	[X86] Add isel pattern infrastructure to begin recognizing when we're inserting 0s into the upper portions of a vector register and the producing instruction as already produced the zeros. Currently if we're inserting 0s into the upper elements of a vector register we insert an explicit move of the smaller register to implicitly zero the upper bits. But if we can prove that they are already zero we can skip that. This is based on a similar idea of what we do to avoid emitting explicit zero extends for GR32->GR64. Unfortunately, this is harder for vector registers because there are several opcodes that don't have VEX equivalent instructions, but can write to XMM registers. Among these are SHA instructions and a MMX->XMM move. Bitcasts can also get in the way. So for now I'm starting with explicitly allowing only VPMADDWD because we emit zeros in combineLoopMAddPattern. So that is placing extra instruction into the reduction loop. I'd like to allow PSADBW as well after D37453, but that's currently blocked by a bitcast. We either need to peek through bitcasts or canonicalize insert_subvectors with zeros to remove bitcasts on the value being inserted. Longer term we should probably have a cleanup pass that removes superfluous zeroing moves even when the producer is in another basic block which is something these isel tricks can't do. See PR32544. Differential Revision: https://reviews.llvm.org/D37653 llvm-svn: 313365	2017-09-15 17:09:00 +00:00
Krzysztof Parzyszek	325bb38667	[Hexagon] Switch to parameterized register classes for HVX This removes the duplicate HVX instruction set for the 128-byte mode. Single instruction set now works for both modes (64- and 128-byte). llvm-svn: 313362	2017-09-15 15:46:05 +00:00
Sjoerd Meijer	386ba01b9c	[AArch64] allow v8f16 types when FullFP16 is supported This adds support for allowing v8f16 vector types, thus avoiding conversions from/to single precision for these types. This is a follow up patch of commits r311154 and r312104, which added support for scalars and v4f16 types, respectively. Differential Revision: https://reviews.llvm.org/D37802 llvm-svn: 313351	2017-09-15 09:24:48 +00:00
Jatin Bhateja	856f7f79e2	[X86] PR32755 : Improvement in CodeGen instruction selection for LEAs. Summary: 1/ Operand folding during complex pattern matching for LEAs has been extended, such that it promotes Scale to accommodate similar operand appearing in the DAG. e.g. T1 = A + B T2 = T1 + 10 T3 = T2 + A For above DAG rooted at T3, X86AddressMode will no look like Base = B , Index = A , Scale = 2 , Disp = 10 2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs so that if there is an opportunity then complex LEAs (having 3 operands) could be factored out. e.g. leal 1(%rax,%rcx,1), %rdx leal 1(%rax,%rcx,2), %rcx will be factored as following leal 1(%rax,%rcx,1), %rdx leal (%rdx,%rcx) , %edx 3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops, thus avoiding creation of any complex LEAs within a loop. Reviewers: lsaba, RKSimon, craig.topper, qcolombet Reviewed By: lsaba Subscribers: spatel, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D35014 llvm-svn: 313343	2017-09-15 05:29:51 +00:00
Craig Topper	3bcc062cdc	[X86] Remove an unnecessary SmallVector from LowerBUILD_VECTOR. I think this may have existed to convert from SDUse to SDValue, but it doesn't look like its needed now. llvm-svn: 313311	2017-09-14 22:47:59 +00:00
Jan Sjodin	43e732c065	Fix warnings in r313297. llvm-svn: 313302	2017-09-14 21:49:52 +00:00
Matt Arsenault	a8560d8853	AMDGPU: Fix violating constant bus restriction You can't use madmk/madmk if it already uses an SGPR input. llvm-svn: 313298	2017-09-14 20:54:29 +00:00
Jan Sjodin	242b2dcd0b	Add AddresSpace to PseudoSourceValue. Differential Revision: https://reviews.llvm.org/D35089 llvm-svn: 313297	2017-09-14 20:53:51 +00:00
Matt Arsenault	f18ea9e4aa	AMDGPU: Fix assert on alloca of array of struct llvm-svn: 313282	2017-09-14 18:02:29 +00:00
Matt Arsenault	0a3745b2a6	AMDGPU: Stop modifying SP in call sequences Because the stack growth direction and addressing is done in the same direction, modifying SP at the beginning of the call sequence was incorrect. If we had a stack passed argument, we would end up skipping that number of bytes before pushing arguments, leaving unused/inconsistent space. The callee creates fixed stack objects in its frame, so the space necessary for these is already logically allocated in the callee, so we just let the callee increment SP if it really requires it. llvm-svn: 313279	2017-09-14 17:37:40 +00:00
Simon Dardis	c8036cbb4c	[mips] Implement the 'dext' aliases and it's disassembly alias. The other members of the dext family of instructions (dextm, dextu) are traditionally handled by the assembler selecting the right variant of 'dext' depending on the values of the position and size operands. When these instructions are disassembled, rather than reporting the actual instruction, an equivalent aliased form of 'dext' is generated and is reported. This is to mimic the behaviour of binutils. Reviewers: slthakur, nitesh.jain, atanasyan Differential Revision: https://reviews.llvm.org/D34887 llvm-svn: 313276	2017-09-14 17:27:53 +00:00
Matt Arsenault	332360a091	AMDGPU: Make frame register caller preserved Using SplitCSR for the frame register was very broken. Often the copies in the prolog and epilog were optimized out, in addition to them being inserted after the true prolog where the FP was clobbered. I have a hacky solution which works that continues to use split CSR, but for now this is simpler and will get to working programs. llvm-svn: 313274	2017-09-14 17:14:57 +00:00
Simon Dardis	279ec81c7c	[mips] Implement the 'dins' aliases. Traditionally GAS has provided automatic selection between dins, dinsm and dinsu. Binutils also disassembles all instructions in that family as 'dins' rather than the actual instruction. Reviewers: slthakur Differential Revision: https://reviews.llvm.org/D34877 llvm-svn: 313267	2017-09-14 15:17:50 +00:00
Aleksandar Beserminji	22c7d9115b	Test commit. llvm-svn: 313262	2017-09-14 14:34:04 +00:00
Krzysztof Parzyszek	805df76dbe	[Hexagon] Make getMemAccessSize return size in bytes It used to return the actual field value from the instruction descriptor. There is no reason for that, that value is not interesting in any way and the specifics of its encoding in the descriptor should not be exposed. llvm-svn: 313257	2017-09-14 12:06:40 +00:00
Ayman Musa	7c7c79c212	[X86] When applying the shuffle-to-zero-extend transformation on floating point, bitcast to integer first. Fix issue described in PR34577. Differential Revision: https://reviews.llvm.org/D37803 llvm-svn: 313256	2017-09-14 12:06:38 +00:00
Simon Dardis	c2db7efb6c	[mips] Pick the right variant of DINS upfront and enable target instruction verification This patch complements D16810 "[mips] Make isel select the correct DEXT variant up front.". Now ISel picks the right variant of DINS, so now there is no need to replace DINS with the appropriate variant during MipsMCCodeEmitter::encodeInstruction(). This patch also enables target specific instruction verification for ins, dins, dinsm, dinsu, ext, dext, dextm, dextu. These instructions have constraints that are checked when generating MipsISD::Ins and MipsISD::Ext nodes, but these constraints are not checked during instruction selection. Adding machine verification should catch outstanding cases. Finally, correct a bug that instruction verification uncovered, where the position operand of a DINSU generated during lowering was being silently and accidently corrected to the correct value. Reviewers: slthakur Differential Revision: https://reviews.llvm.org/D34809 llvm-svn: 313254	2017-09-14 10:58:00 +00:00
Matt Arsenault	668655a056	AMDGPU: Don't spill SP reg like a normal CSR llvm-svn: 313217	2017-09-13 23:47:01 +00:00
Stanislav Mekhanoshin	fbfa163a41	Allow target to decide when to cluster loads/stores in misched MachineScheduler when clustering loads or stores checks if base pointers point to the same memory. This check is done through comparison of base registers of two memory instructions. This works fine when instructions have separate offset operand. If they require a full calculated pointer such instructions can never be clustered according to such logic. Changed shouldClusterMemOps to accept base registers as well and let it decide what to do about it. Differential Revision: https://reviews.llvm.org/D37698 llvm-svn: 313208	2017-09-13 22:20:47 +00:00
Matt Arsenault	dd4680bcec	AMDGPU: Handle coldcc in more places Missed in r312936 llvm-svn: 313205	2017-09-13 21:55:52 +00:00
Michael Zuckerman	ca9ecdf6e7	Refactoring the stride 4 code in the X86interleavedaccess NFC llvm-svn: 313166	2017-09-13 18:28:09 +00:00
Petar Jovanovic	288331a64a	[mips] correct operand range for DINSM instruction This patch corrects the definition of the DINSM instruction. Specification for DINSM instruction for Mips64 says that size operand should be 2 <= size <= 64, but it is defined as uimm5_inssize_plus1 which gives range of 1 .. 32. Patch by Aleksandar Beserminji. Differential Revision: https://reviews.llvm.org/D37683 llvm-svn: 313149	2017-09-13 14:09:13 +00:00
Stefan Pintilie	575a95094e	[Power9] Add missing instructions: extswsli, popcntb Added the following P9 instructions: extswsli, extswsli., popcntb Differential Revision: https://reviews.llvm.org/D37342 llvm-svn: 313147	2017-09-13 14:05:27 +00:00
Igor Breger	27a3be6382	[GlobalISel][X86] support G_FPEXT operation. Summary: Support G_FPEXT operation. Selection done via TableGen'erated code. Reviewers: zvi, guyblank, aymanmus, m_zuckerman Reviewed By: zvi Subscribers: rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D34816 llvm-svn: 313135	2017-09-13 09:05:23 +00:00
Uriel Korach	b69c1280fa	[X86] [PATCH] [intrinsics] Lowering X86 ABS intrinsics to IR. (llvm) This patch, together with a matching clang patch (https://reviews.llvm.org/D37694), implements the lowering of X86 ABS intrinsics to IR. differential revision: https://reviews.llvm.org/D37693. llvm-svn: 313134	2017-09-13 09:02:36 +00:00
Mohammed Agabaria	0dc4e72b98	[X86] Adding X86 Processor Families Adding x86 Processor families to initialize several uArch properties (based on the family) This patch shows how gather cost can be initialized based on the proc. family Differential Revision: https://reviews.llvm.org/D35348 llvm-svn: 313132	2017-09-13 09:00:27 +00:00
Craig Topper	50603fad45	[X86] Make sure we emit a SUBREG_TO_REG after the MOV32ri when creating a BEXTR64rr instruction from a shift/and pair. Fixes PR34589. llvm-svn: 313126	2017-09-13 07:53:21 +00:00
Elena Demikhovsky	2a5d9095cc	[X86 CodeGen] Optimization of ZeroExtendLoad for v2i8 vector Load with zero-extend and sign-extend from v2i8 to v2i32 is "Legal" since SSE4.1 and may be performed using PMOVZXBD , PMOVSXBD instructions. llvm-svn: 313121	2017-09-13 06:40:26 +00:00
Craig Topper	357b62f45e	[X86] Use isUInt<32> to simplify some code. NFC llvm-svn: 313112	2017-09-13 02:29:59 +00:00
Petr Hosek	fe899bd9df	[Fuchsia] Magenta -> Zircon Fuchsia's lowest API layer has been renamed from Magenta to Zircon. In LLVM proper, this is only mentioned in comments. Patch by Roland McGrath Differential Revision: https://reviews.llvm.org/D37763 llvm-svn: 313105	2017-09-13 01:18:06 +00:00
Derek Schuff	cd2a10a385	[WebAssembly] Add sign extend instructions from atomics proposal Select them from ISD::SIGN_EXTEND_INREG Differential Revision: https://reviews.llvm.org/D37603 remove spurious change llvm-svn: 313101	2017-09-13 00:29:06 +00:00
Sanjay Patel	e2412b087e	[x86] eliminate unnecessary vector compare for AVX masked store The masked store instruction only cares about the sign-bit of each mask element, so the compare s<0 isn't needed. As noted in PR11210: https://bugs.llvm.org/show_bug.cgi?id=11210 ...fixing this should allow us to eliminate x86-specific masked store intrinsics in IR. (Although more testing will be needed to confirm that.) I filed a bug to track improvements for AVX512: https://bugs.llvm.org/show_bug.cgi?id=34584 Differential Revision: https://reviews.llvm.org/D37446 llvm-svn: 313089	2017-09-12 23:24:05 +00:00
Petar Jovanovic	3ffe632777	[mips] handle UImm16_AltRelaxed match type Currently, UImm16_AltRelaxed match type is not handled in MatchAndEmitInstruction() function, which may result in llvm_unreachable() behavior. This patch adds necessary case for this match type. Patch by Aleksandar Beserminji. Differential Revision: https://reviews.llvm.org/D37682 llvm-svn: 313077	2017-09-12 21:43:33 +00:00
Ahmed Bougacha	f676eb073f	[AArch64][GlobalISel] Select all fpexts. Tablegen already can select these: mark them as legal, remove the c++ code, and add tests for all types. llvm-svn: 313074	2017-09-12 21:04:11 +00:00
Ahmed Bougacha	698d6831e1	[AArch64][GlobalISel] Select all fptruncs. We already support these in tablegen, but we're matching the wrong operator (libm ftrunc). Fix that. While there, drop the c++ code, support COPYs of FPR16, and add tests for the other types. llvm-svn: 313073	2017-09-12 21:04:10 +00:00
Lei Huang	ffe49cfc41	Update branch coalescing to be a PowerPC specific pass Implementing this pass as a PowerPC specific pass. Branch coalescing utilizes the analyzeBranch method which currently does not include any implicit operands. This is not an issue on PPC but must be handled on other targets. Pass is currently off by default. Enabled via -enable-ppc-branch-coalesce. Differential Revision : https: // reviews.llvm.org/D32776 llvm-svn: 313061	2017-09-12 18:39:11 +00:00
Yonghong Song	e59f4f68a1	bpf: Add BPF AsmParser support in LLVM Reviewed-by: Yonghong Song <yhs@fb.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> llvm-svn: 313055	2017-09-12 17:55:23 +00:00
Craig Topper	91215998d2	[X86] Move matching of (and (srl/sra, C), (1<<C) - 1) to BEXTR/BEXTRI instruction to custom isel Recognizing this pattern during DAG combine hides information about the 'and' and the shift from other combines. I think it should be recognized at isel so its as late as possible. But it can't be done with table based isel because you need to be able to look at both immediates. This patch moves it to custom isel in X86ISelDAGToDAG.cpp. This does break a couple tests in tbm_patterns because we are now emitting an and_flag node or (cmp and, 0) that we dont' recognize yet. We already had this problem for several other TBM patterns so I think this fine and we can address of them together. I've also fixed a bug where the combine to BEXTR was preventing us from using a trick of zero extending AH to handle extracts of bits 15:8. We might still want to use BEXTR if it enables load folding. But honestly I hope we narrowed the load instead before got to isel. I think we should probably also support matching BEXTR from (srl/srl (and mask << C), C). But that should be a different patch. Differential Revision: https://reviews.llvm.org/D37592 llvm-svn: 313054	2017-09-12 17:40:25 +00:00
Hans Wennborg	5129faa331	Revert r313009 "[ARM] Use ADDCARRY / SUBCARRY" This was causing PR34045 to fire again. > This is a preparatory step for D34515 and also is being recommitted as its > first version caused PR34045. > > This change: > - makes nodes ISD::ADDCARRY and ISD::SUBCARRY legal for i32 > - lowering is done by first converting the boolean value into the carry flag > using (_, C) ← (ARMISD::ADDC R, -1) and converted back to an integer value > using (R, _) ← (ARMISD::ADDE 0, 0, C). An ARMISD::ADDE between the two > operations does the actual addition. > - for subtraction, given that ISD::SUBCARRY second result is actually a > borrow, we need to invert the value of the second operand and result before > and after using ARMISD::SUBE. We need to invert the carry result of > ARMISD::SUBE to preserve the semantics. > - given that the generic combiner may lower ISD::ADDCARRY and > ISD::SUBCARRYinto ISD::UADDO and ISD::USUBO we need to update their lowering > as well otherwise i64 operations now would require branches. This implies > updating the corresponding test for unsigned. > - add new combiner to remove the redundant conversions from/to carry flags > to/from boolean values (ARMISD::ADDC (ARMISD::ADDE 0, 0, C), -1) → C > - fixes PR34045 > > Differential Revision: https://reviews.llvm.org/D35192 Also revert follow-up r313010: > [ARM] Fix typo when creating ISD::SUB nodes > > In D35192, I accidentally introduced a typo when creating ISD::SUB nodes, > giving them two values instead of one. > > This fails when the merge_values combiner finds one of these nodes. > > This change fixes PR34564. > > Differential Revision: https://reviews.llvm.org/D37690 llvm-svn: 313044	2017-09-12 16:24:17 +00:00
Jonas Paulsson	ed6377b66d	[SystemZ] Add the CoveredBySubRegs bit to GPR64, GPR128 and FPR128 registers. This bit is needed in order for the CalleeSavedRegs list to automatically include the super registers if all of their subregs are present. Thanks to Wei Mi for initially indicating this deficiency in the SystemZ backend. Review: Ulrich Weigand. https://bugs.llvm.org/show_bug.cgi?id=34550 llvm-svn: 313023	2017-09-12 12:11:29 +00:00
Sjoerd Meijer	b29245c16e	[AArch64] ISel: Add some debug messages to LowerBUILDVECTOR. NFC. Differential Revision: https://reviews.llvm.org/D37676 llvm-svn: 313017	2017-09-12 10:24:12 +00:00
Yael Tsafrir	78fe0a651c	[X86] Lower _mm[256\|512]_[mask[z]]_avg_epu[8\|16] intrinsics to native llvm IR Differential Revision: https://reviews.llvm.org/D37560 llvm-svn: 313013	2017-09-12 07:50:35 +00:00
Roger Ferrer Ibanez	2551ff498d	[ARM] Fix typo when creating ISD::SUB nodes In D35192, I accidentally introduced a typo when creating ISD::SUB nodes, giving them two values instead of one. This fails when the merge_values combiner finds one of these nodes. This change fixes PR34564. Differential Revision: https://reviews.llvm.org/D37690 llvm-svn: 313010	2017-09-12 07:42:28 +00:00
Roger Ferrer Ibanez	4f7185b115	[ARM] Use ADDCARRY / SUBCARRY This is a preparatory step for D34515 and also is being recommitted as its first version caused PR34045. This change: - makes nodes ISD::ADDCARRY and ISD::SUBCARRY legal for i32 - lowering is done by first converting the boolean value into the carry flag using (_, C) ← (ARMISD::ADDC R, -1) and converted back to an integer value using (R, _) ← (ARMISD::ADDE 0, 0, C). An ARMISD::ADDE between the two operations does the actual addition. - for subtraction, given that ISD::SUBCARRY second result is actually a borrow, we need to invert the value of the second operand and result before and after using ARMISD::SUBE. We need to invert the carry result of ARMISD::SUBE to preserve the semantics. - given that the generic combiner may lower ISD::ADDCARRY and ISD::SUBCARRYinto ISD::UADDO and ISD::USUBO we need to update their lowering as well otherwise i64 operations now would require branches. This implies updating the corresponding test for unsigned. - add new combiner to remove the redundant conversions from/to carry flags to/from boolean values (ARMISD::ADDC (ARMISD::ADDE 0, 0, C), -1) → C - fixes PR34045 Differential Revision: https://reviews.llvm.org/D35192 llvm-svn: 313009	2017-09-12 07:40:09 +00:00
Craig Topper	36bb424314	[X86] Fix typo in comment. NFC llvm-svn: 312990	2017-09-12 01:30:09 +00:00
Hans Wennborg	53fd82bbcc	Revert r312898 "[ARM] Use ADDCARRY / SUBCARRY" It caused PR34564. > This is a preparatory step for D34515 and also is being recommitted as its > first version caused PR34045. > > This change: > - makes nodes ISD::ADDCARRY and ISD::SUBCARRY legal for i32 > - lowering is done by first converting the boolean value into the carry flag > using (_, C) ← (ARMISD::ADDC R, -1) and converted back to an integer value > using (R, _) ← (ARMISD::ADDE 0, 0, C). An ARMISD::ADDE between the two > operations does the actual addition. > - for subtraction, given that ISD::SUBCARRY second result is actually a > borrow, we need to invert the value of the second operand and result before > and after using ARMISD::SUBE. We need to invert the carry result of > ARMISD::SUBE to preserve the semantics. > - given that the generic combiner may lower ISD::ADDCARRY and > ISD::SUBCARRYinto ISD::UADDO and ISD::USUBO we need to update their lowering > as well otherwise i64 operations now would require branches. This implies > updating the corresponding test for unsigned. > - add new combiner to remove the redundant conversions from/to carry flags > to/from boolean values (ARMISD::ADDC (ARMISD::ADDE 0, 0, C), -1) → C > - fixes PR34045 > > Differential Revision: https://reviews.llvm.org/D35192 llvm-svn: 312980	2017-09-11 23:52:02 +00:00
Yonghong Song	75e988b98f	bpf: add " ll" in the LD_IMM64 asmstring This partially revert previous fix in commit f5858045aa0b ("bpf: proper print imm64 expression in inst printer"). In that commit, the original suffix "ll" is removed from LD_IMM64 asmstring. In the customer print method, the "ll" suffix is printed if the rhs is an immediate. For example, "r2 = 5ll" => "r2 = 5ll", and "r3 = varll" => "r3 = var". This has an issue though for assembler. Since assembler relies on asmstring to do pattern matching, it will not be able to distiguish between "mov r2, 5" and "ld_imm64 r2, 5" since both asmstring is "r2 = 5". In such cases, the assembler uses 64bit load for all "r = <val>" asm insts. This patch adds back " ll" suffix for ld_imm64 with one additional space for "#reg = #global_var" case. Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> llvm-svn: 312978	2017-09-11 23:43:35 +00:00
Matt Arsenault	c6fab4ecd2	AMDGPU: Allow coldcc calls llvm-svn: 312936	2017-09-11 18:54:20 +00:00
Petar Jovanovic	a5ac789396	[mips][microMIPS] add lapc instruction Implement LAPC instruction for mips32r6, mips64r6 and micromips32r6. Patch by Milos Stojanovic. Differential Revision: https://reviews.llvm.org/D35984 llvm-svn: 312934	2017-09-11 18:34:04 +00:00
Stanislav Mekhanoshin	ac399c356c	[AMDGPU] Produce madak and madmk from the two-address pass These two instructions are normally selected, but when the two address pass converts mac into mad we end up with the mad where we could have one of these. Differential Revision: https://reviews.llvm.org/D37389 llvm-svn: 312928	2017-09-11 17:13:57 +00:00
Craig Topper	c923b2dacd	[X86] Remove portions of r275950 that are no longer needed with i1 not being a legal type Summary: r275950 added support for turning (trunc (X >> N) to i1) into BT(X, N). But that's no longer necessary now that i1 isn't legal. This patch removes the support for that, but preserves some of the refactorings done in that commit. Reviewers: guyblank, RKSimon, spatel, zvi Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37673 llvm-svn: 312925	2017-09-11 16:16:48 +00:00
Simon Pilgrim	8beb4573f2	[X86][SSE] Add support for X86ISD::PACKSS to ComputeNumSignBitsForTargetNode Helps improve combineLogicBlendIntoPBLENDV support by allowing us to peek into through PACKSS truncations of vector comparison results. Differential Revision: https://reviews.llvm.org/D37680 llvm-svn: 312916	2017-09-11 14:03:47 +00:00
Tim Renouf	0ce7d42fef	[AMDGPU] exp should not be in WQM mode A mrt exp with vm=1 must be in exact (non-WQM) mode, as it also exports the exec mask as the valid mask to determine which pixels to render. This commit marks any exp as needing to be in exact mode. Actually, if there are multiple mrt exps, only one needs to have vm=1, and only that one needs to be in exact mode. But that is an optimization for another day. Differential Revision: https://reviews.llvm.org/D36305 llvm-svn: 312915	2017-09-11 13:55:39 +00:00
Andre Vieira	ff00b0b024	[ARM] Enable the use of SVC anywhere in an IT block Differential Revision: https://reviews.llvm.org/D37374 llvm-svn: 312908	2017-09-11 11:11:17 +00:00
Dylan McKay	880b08eb60	[AVR] Enable the '__do_copy_data' function Also enables '__do_clear_bss'. These functions are automaticalled called by the CRT if they are declared. We need these to be called otherwise RAM will start completely uninitialised, even though we need to copy RAM variables from progmem to RAM. llvm-svn: 312905	2017-09-11 10:32:51 +00:00
Igor Breger	3f42dd4a6f	[GlobalISel][X86] G_ANYEXT support. Summary: G_ANYEXT support Reviewers: zvi, delena Reviewed By: delena Subscribers: rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D37675 llvm-svn: 312903	2017-09-11 09:41:13 +00:00
Tim Renouf	a63255f185	AMDGPU: trivial comment change ... to check commit access for new committer. llvm-svn: 312900	2017-09-11 08:31:32 +00:00
Roger Ferrer Ibanez	c0927a36c4	[ARM] Use ADDCARRY / SUBCARRY This is a preparatory step for D34515 and also is being recommitted as its first version caused PR34045. This change: - makes nodes ISD::ADDCARRY and ISD::SUBCARRY legal for i32 - lowering is done by first converting the boolean value into the carry flag using (_, C) ← (ARMISD::ADDC R, -1) and converted back to an integer value using (R, _) ← (ARMISD::ADDE 0, 0, C). An ARMISD::ADDE between the two operations does the actual addition. - for subtraction, given that ISD::SUBCARRY second result is actually a borrow, we need to invert the value of the second operand and result before and after using ARMISD::SUBE. We need to invert the carry result of ARMISD::SUBE to preserve the semantics. - given that the generic combiner may lower ISD::ADDCARRY and ISD::SUBCARRYinto ISD::UADDO and ISD::USUBO we need to update their lowering as well otherwise i64 operations now would require branches. This implies updating the corresponding test for unsigned. - add new combiner to remove the redundant conversions from/to carry flags to/from boolean values (ARMISD::ADDC (ARMISD::ADDE 0, 0, C), -1) → C - fixes PR34045 Differential Revision: https://reviews.llvm.org/D35192 llvm-svn: 312898	2017-09-11 07:38:05 +00:00
Simon Pilgrim	7c4f3f69df	[X86][SSE] Tidyup + clang-format combineX86ShuffleChain call. NFCI. llvm-svn: 312887	2017-09-10 18:18:45 +00:00
Simon Pilgrim	45250fa319	[X86][SSE] Move combineTo call out of combineX86ShufflesConstants. NFCI. Move towards making it possible to use the shuffle combines for cases where we don't want to call DCI.CombineTo() with the result. llvm-svn: 312886	2017-09-10 18:10:49 +00:00
Simon Pilgrim	add6eb986b	[X86][SSE] Move combineTo call out of combineX86ShuffleChain. NFCI. First step towards making it possible to use the shuffle combines for cases where we don't want to call DCI.CombineTo() with the result. llvm-svn: 312884	2017-09-10 14:06:41 +00:00
Coby Tayree	c27f451050	[X86][X86AsmParser] adding const on InlineAsmIdentifierInfo in CreateMemForInlineAsm. NFC. llvm-svn: 312881	2017-09-10 12:21:24 +00:00
Uriel Korach	b8b7209ffa	Revert "adding autoUpgrade support to broadcast[f\|i]32x2 intrinsics" This reverts commit r312879 - An accidental partial commit. llvm-svn: 312880	2017-09-10 09:07:21 +00:00
Uriel Korach	34f7febfe4	adding autoUpgrade support to broadcast[f\|i]32x2 intrinsics llvm-svn: 312879	2017-09-10 08:40:13 +00:00
Craig Topper	d87f426279	[X86] Don't disable slow INC/DEC if optimizing for size Summary: Just because INC/DEC is a little slow on some processors doesn't mean we shouldn't prefer it when optimizing for size. This appears to match gcc behavior. Reviewers: chandlerc, zvi, RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37177 llvm-svn: 312866	2017-09-09 17:11:59 +00:00
Sanjay Patel	554ba948c2	[DivRempairs] add a pass to optimize div/rem pairs (PR31028) This is intended to be a superset of the functionality from D31037 (EarlyCSE) but implemented as an independent pass, so there's no stretching of scope and feature creep for an existing pass. I also proposed a weaker version of this for SimplifyCFG in D30910. And I initially had almost this same functionality as an addition to CGP in the motivating example of PR31028: https://bugs.llvm.org/show_bug.cgi?id=31028 The advantage of positioning this ahead of SimplifyCFG in the pass pipeline is that it can allow more flattening. But it needs to be after passes (InstCombine) that could sink a div/rem and undo the hoisting that is done here. Decomposing remainder may allow removing some code from the backend (PPC and possibly others). Differential Revision: https://reviews.llvm.org/D37121 llvm-svn: 312862	2017-09-09 13:38:18 +00:00

1 2 3 4 5 ...

43983 Commits