llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-23 13:02:52 +02:00

Author	SHA1	Message	Date
George Burgess IV	40c3d6a6b6	Let llvm.objectsize be conservative with null pointers This adds a parameter to @llvm.objectsize that makes it return conservative values if it's given null. This fixes PR23277. Differential Revision: https://reviews.llvm.org/D28494 llvm-svn: 298430	2017-03-21 20:08:59 +00:00
Marek Olsak	42baf8d603	AMDGPU: Buffer descriptor changes for GFX9 Reviewers: arsenm Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, dstuttard, tpr Differential Revision: https://reviews.llvm.org/D31158 llvm-svn: 298397	2017-03-21 17:00:39 +00:00
Marek Olsak	4bf1e53d20	AMDGPU: Always use VGPR indexing on GFX9 Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, dstuttard, tpr Differential Revision: https://reviews.llvm.org/D31157 llvm-svn: 298396	2017-03-21 17:00:32 +00:00
Reid Kleckner	27d17d1713	Rename AttributeSet to AttributeList Summary: This class is a list of AttributeSetNodes corresponding the function prototype of a call or function declaration. This class used to be called ParamAttrListPtr, then AttrListPtr, then AttributeSet. It is typically accessed by parameter and return value index, so "AttributeList" seems like a more intuitive name. Rename AttributeSetImpl to AttributeListImpl to follow suit. It's useful to rename this class so that we can rename AttributeSetNode to AttributeSet later. AttributeSet is the set of attributes that apply to a single function, argument, or return value. Reviewers: sanjoy, javed.absar, chandlerc, pete Reviewed By: pete Subscribers: pete, jholewinski, arsenm, dschuff, mehdi_amini, jfb, nhaehnle, sbc100, void, llvm-commits Differential Revision: https://reviews.llvm.org/D31102 llvm-svn: 298393	2017-03-21 16:57:19 +00:00
Matt Arsenault	af83d590a5	AMDGPU: Fix not including v2i16/v2f16 in register class llvm-svn: 298390	2017-03-21 16:42:50 +00:00
Matt Arsenault	489b4bdeba	AMDGPU: Fix asserting on 0 dmask for image intrinsics Fold these to undef during lowering so users get eliminated. llvm-svn: 298387	2017-03-21 16:32:17 +00:00
Valery Pykhtin	fd341e28f3	[AMDGPU] Iterative scheduling infrastructure + minimal registry scheduler Differential revision: https://reviews.llvm.org/D31046 llvm-svn: 298368	2017-03-21 13:15:46 +00:00
Sam Kolton	fcb49c3b8d	[ADMGPU] SDWA peephole optimization pass. Summary: First iteration of SDWA peephole. This pass tries to combine several instruction into one SDWA instruction. E.g. it converts: ''' V_LSHRREV_B32_e32 %vreg0, 16, %vreg1 V_ADD_I32_e32 %vreg2, %vreg0, %vreg3 V_LSHLREV_B32_e32 %vreg4, 16, %vreg2 ''' Into: ''' V_ADD_I32_sdwa %vreg4, %vreg1, %vreg3 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD ''' Pass structure: 1. Iterate over machine instruction in basic block and try to apply "SDWA patterns" to each of them. SDWA patterns match machine instruction into either source or destination SDWA operand. E.g. ''' V_LSHRREV_B32_e32 %vreg0, 16, %vreg1''' is matched to source SDWA operand '''%vreg1 src_sel:WORD_1'''. 2. Iterate over found SDWA operands and find instruction that could be potentially coverted into SDWA. E.g. for source SDWA operand potential instruction are all instruction in this basic block that uses '''%vreg0''' 3. Iterate over all potential instructions and check if they can be converted into SDWA. 4. Convert instructions to SDWA. This review contains basic implementation of SDWA peephole pass. This pass requires additional testing fot both correctness and performance (no performance testing done). There are several ways this pass can be improved: 1. Make this pass work on whole function not only basic block. As I can see this can be done right now without changes to pass. 2. Introduce more SDWA patterns 3. Introduce mnemonics to limit when SDWA patterns should apply Reviewers: vpykhtin, alex-t, arsenm, rampitec Subscribers: wdng, nhaehnle, mgorny Differential Revision: https://reviews.llvm.org/D30038 llvm-svn: 298365	2017-03-21 12:51:34 +00:00
Konstantin Zhuravlyov	c2d33a361e	[AMDGPU] Run always inliner early in opt Differential Revision: https://reviews.llvm.org/D31141 llvm-svn: 298281	2017-03-20 18:06:45 +00:00
Dmitry Preobrazhensky	4d71db8c7a	[AMDGPU][MC] Fix for Bugs 28201, 28199, 28170 + LIT tests This fix enables sp3 abs modifier with constants Reviewers: artem.tamazov Differential Revision: https://reviews.llvm.org/D30825 llvm-svn: 298265	2017-03-20 16:33:20 +00:00
Dmitry Preobrazhensky	8fd7a07447	[AMDGPU][MC] Fix for Bugs 28200, 28202 + LIT tests Fixed several related issues with VOP3 fp modifiers. Reviewers: artem.tamazov Differential Revision: https://reviews.llvm.org/D30821 llvm-svn: 298255	2017-03-20 14:50:35 +00:00
Konstantin Zhuravlyov	6fe40181b8	Revert "[AMDGPU] Run always inliner early in opt" This reverts commit r297958, it breaks device-libs build. llvm-svn: 298239	2017-03-20 09:26:08 +00:00
Simon Pilgrim	fd15ce160a	Fix MSVC warning: "switch statement contains 'default' but no 'case' labels". NFCI. llvm-svn: 298225	2017-03-19 16:39:04 +00:00
Stanislav Mekhanoshin	123733906c	[AMDGPU] Add address space based alias analysis pass This is direct port of HSAILAliasAnalysis pass, just cleaned for style and renamed. Differential Revision: https://reviews.llvm.org/D31103 llvm-svn: 298172	2017-03-17 23:56:58 +00:00
Matt Arsenault	411733acd6	AMDGPU: Fix broken condition in hazard recognizer Fixes bug 32248. llvm-svn: 298125	2017-03-17 21:36:28 +00:00
Matt Arsenault	bf7fbe0afa	AMDGPU: Fix handling of constant phi input loop conditions If the loop condition was an i1 phi with a constantexpr input, this would add a loop intrinsic fed by a phi dependent on a call to if.break in the same block. Insert the call in the loop header. llvm-svn: 298121	2017-03-17 20:52:21 +00:00
Matt Arsenault	f64359e67f	AMDGPU: Cleanup control flow intrinsics Move backend internal intrinsics along with the rest of the normal intrinsics, and use the Intrinsic::getDeclaration API instead of manually constructing the type list. It's surprising this was working before. fdiv.fast had the wrong number of parameters. The control flow intrinsic declaration attributes were not being applied, and their types were inconsistent. The actual IR use types did not match the declaration, and were closer to the types used for the patterns. The brcond lowering was changing the types, so introduce new nodes for those. llvm-svn: 298119	2017-03-17 20:41:45 +00:00
Stanislav Mekhanoshin	1c042b3ab9	Only unswitch loops with uniform conditions Loop unswitching can be extremely harmful for a SIMT target. In case if hoisted condition is not uniform a SIMT machine will execute both clones of a loop sequentially. Therefor LoopUnswitch checks if the condition is non-divergent. Since DivergenceAnalysis adds an expensive PostDominatorTree analysis not needed for non-SIMT targets a new option is added to avoid unneded analysis initialization. The method getAnalysisUsage is called when TargetTransformInfo is not yet available and we cannot use it here. For that reason a new field DivergentTarget is added to PassManagerBuilder to control the behavior and set this field from a target. Differential Revision: https://reviews.llvm.org/D30796 llvm-svn: 298104	2017-03-17 17:13:41 +00:00
Stanislav Mekhanoshin	e7e6d76e45	[AMDGPU] Run always inliner early in opt We can mark functions to always inline early in the opt. Since we do not have call support this early inlining creates opportunities for inter-procedural optimizations which would not occur otherwise. Differential Revision: https://reviews.llvm.org/D31016 llvm-svn: 297958	2017-03-16 16:11:46 +00:00
Matt Arsenault	92b2f9ae49	AMDGPU: Allow sinking of addressing modes for atomic_inc/dec llvm-svn: 297913	2017-03-15 23:15:12 +00:00
Matt Arsenault	831029e78d	AMDGPU: Fix unnecessary ands when packing f16 vectors computeKnownBits didn't handle fp_to_fp16 to report the high bits as 0. ARM maps the generic node to an instruction that does not modify the high bits of the register, so introduce a target node where the high bits are known 0. llvm-svn: 297873	2017-03-15 19:04:26 +00:00
Matt Arsenault	73103f1a0d	AMDGPU: Minor SIAnnotateControlFlow cleanups Newline fixes, early return, range loops. llvm-svn: 297865	2017-03-15 18:00:12 +00:00
Sanjay Patel	093a8cf070	Cyle -> Cycle; NFCI llvm-svn: 297846	2017-03-15 15:37:42 +00:00
Simon Pilgrim	2fe92b56b2	Reverted unintended commit llvm-svn: 297841	2017-03-15 14:47:30 +00:00
Simon Pilgrim	eebd5318fe	Fix Wint-in-bool-context warning (PR32248) llvm-svn: 297840	2017-03-15 14:38:19 +00:00
Matt Arsenault	7b61ea7700	AMDGPU: Re-use TM.getNullPointerValue llvm-svn: 297662	2017-03-13 20:18:14 +00:00
Matt Arsenault	78e7d66a36	AMDGPU: Treat 0 as private null pointer in addrspacecast lowering llvm-svn: 297658	2017-03-13 19:47:31 +00:00
Matt Arsenault	0ccffffd0a	AMDGPU: Remove packf16 intrinsic llvm-svn: 297557	2017-03-11 05:51:16 +00:00
Matt Arsenault	2e752c587f	AMDGPU: Keep track of modifiers when converting v_mac to v_mad Since v_max_f32_e64/v_max_f16_e64 can be folded if the target instruction supports the clamp bit, we also need to maintain modifiers when converting v_mac to v_mad. This fixes a rendering issue with Dirt Rally because a v_mac instruction with the clamp bit set was converted to a v_mad but that bit was lost during the conversion. Fixes: e184e01dd79 ("AMDGPU: Fold FP clamp as modifier bit") Patch by Samuel Pitoiset <samuel.pitoiset@gmail.com> llvm-svn: 297556	2017-03-11 05:40:40 +00:00
Stanislav Mekhanoshin	3b7738cafe	[AMDGPU] Remove getBidirectionalReasonRank This method inverts the Reason field of a scheduling candidate. It does right comparison between RegCritical and RegExcess, but everything else is broken. In fact it can prefer less strong reason such as Weak over RegCritical because Weak > -RegCritical. The CandReason enum is properly sorted, so just remove artificial ranking. Differential Revision: https://reviews.llvm.org/D30557 llvm-svn: 297536	2017-03-11 00:29:27 +00:00
Konstantin Zhuravlyov	7e655f8bc7	[AMDGPU] Split R600/SI getFrameIndexReference and emit stack object offsets for SI Differential Revision: https://reviews.llvm.org/D29674 llvm-svn: 297499	2017-03-10 19:39:07 +00:00
Yaxun Liu	d6041532b8	Rename PT_NOTE namespace name used in AMDGPUPTNote.h Patch by Guansong Zhang. Differential Revision: https://reviews.llvm.org/D30750 llvm-svn: 297498	2017-03-10 19:35:43 +00:00
Changpeng Fang	31a069303d	AMDGPU/SI: Disable unrolling in the loop vectorizer if the loop is not vectorized. Reviewers: arsenm Differential Revision: http://reviews.llvm.org/D30719 llvm-svn: 297328	2017-03-09 00:07:00 +00:00
Matt Arsenault	b3bd0133e4	AMDGPU: Don't wait at end of block with a trivial successor If there is only one successor, and that successor only has one predecessor the wait can obviously be delayed until uses or the end of the next block. This avoids code quality regressions when there are trivial fallthrough blocks inserted for structurization. llvm-svn: 297251	2017-03-08 01:06:58 +00:00
Matt Arsenault	7b541d6cfd	AMDGPU: Constant fold rcp node When doing arcp optimization with a constant denominator, this was leaving behind rcps with constant inputs. llvm-svn: 297248	2017-03-08 00:48:46 +00:00
Changpeng Fang	269b012924	AMDGPU/SI: Do not insert EndCf in an unreachable block Reviewers: arsenm Differential Revision: http://reviews.llvm.org/D22025 llvm-svn: 297243	2017-03-07 23:29:36 +00:00
Daniel Sanders	ffb113ee36	Recommit: [globalisel] Change LLT constructor string into an LLT-based object that knows how to generate it. Summary: This will allow future patches to inspect the details of the LLT. The implementation is now split between the Support and CodeGen libraries to allow TableGen to use this class without introducing layering concerns. Thanks to Ahmed Bougacha for finding a reasonable way to avoid the layering issue and providing the version of this patch without that problem. The problem with the previous commit appears to have been that TableGen was including CodeGen/LowLevelType.h instead of Support/LowLevelTypeImpl.h. Reviewers: t.p.northover, qcolombet, rovka, aditya_nandakumar, ab, javed.absar Subscribers: arsenm, nhaehnle, mgorny, dberris, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D30046 llvm-svn: 297241	2017-03-07 23:20:35 +00:00
Daniel Sanders	fa8669c472	Revert r297177: Change LLT constructor string into an LLT-based object ... More module problems. This time it only showed up in the stage 2 compile of clang-x86_64-linux-selfhost-modules-2 but not the stage 1 compile. Somehow, this change causes the build to need Attributes.gen before it's been generated. llvm-svn: 297188	2017-03-07 19:21:23 +00:00
Daniel Sanders	ffeec3d802	[globalisel] Change LLT constructor string into an LLT-based object that knows how to generate it. Summary: This will allow future patches to inspect the details of the LLT. The implementation is now split between the Support and CodeGen libraries to allow TableGen to use this class without introducing layering concerns. Thanks to Ahmed Bougacha for finding a reasonable way to avoid the layering issue and providing the version of this patch without that problem. Reviewers: t.p.northover, qcolombet, rovka, aditya_nandakumar, ab, javed.absar Subscribers: arsenm, nhaehnle, mgorny, dberris, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D30046 llvm-svn: 297177	2017-03-07 18:32:25 +00:00
Konstantin Zhuravlyov	4545081b47	Revert "AMDGPU: Set MCAsmInfo::PointerSize" It breaks line tables because the patch is not complete, working on a complete one at the moment This reverts commit r294031. llvm-svn: 297118	2017-03-07 04:44:33 +00:00
Jan Vesely	af435ac28b	AMDGPU/R600: Fix ALU clause markers use detection also exit early on kill instead of redefinition. Differential Revision: https://reviews.llvm.org/D30230 llvm-svn: 297060	2017-03-06 20:10:05 +00:00
Krzysztof Parzyszek	d151d23d21	Make TargetInstrInfo::isPredicable take a const reference, NFC llvm-svn: 296901	2017-03-03 18:30:54 +00:00
Dmitry Preobrazhensky	38ef58d587	[AMDGPU][MC] Fix for Bug 30829 + LIT tests Added code to check constant bus restrictions for VOP formats (only one SGPR value or literal-constant may be used by the instruction). Note that the same checks are performed by SIInstrInfo::verifyInstruction (used by lowering code). Added LIT tests. llvm-svn: 296873	2017-03-03 14:31:06 +00:00
Matt Arsenault	bd52eeb9f7	AMDGPU: Fix missing dominator tree dependency llvm-svn: 296842	2017-03-02 23:50:51 +00:00
Matt Arsenault	3218c65205	AMDGPU: Fix types for VOP_I16_I16_I16 llvm-svn: 296523	2017-02-28 21:31:45 +00:00
Matt Arsenault	2915cfd231	AMDGPU: Add definition for v_swap_b32 This is somewhat tricky because there are two pairs of tied operands, and it isn't allowed to be VOP3 encoded. llvm-svn: 296519	2017-02-28 21:09:04 +00:00
Matt Arsenault	edcdadeb9c	AMDGPU: Add definition for v_xad_u32 llvm-svn: 296515	2017-02-28 20:27:30 +00:00
Matt Arsenault	a82f04fd90	AMDGPU: Add ds_nop to assembler llvm-svn: 296513	2017-02-28 20:15:46 +00:00
Matt Arsenault	b3df1c0b78	AMDGPU: Add definitions for ds_{read\|write}_b{96\|128} It's not clear to me if this is always better than doing ds_write2_b64 This adds the constraint of a 128-bit register input instead of a pair of 64-bit. llvm-svn: 296512	2017-02-28 20:15:43 +00:00
Stanislav Mekhanoshin	c882b11e5d	[AMDGPU] Add second pass of the scheduler If during scheduling we have identified that we cannot keep optimistic occupancy increase critical register pressure limit and try scheduling of the whole function again. In this case blocks with smaller pressure will have a chance for better scheduling. Differential Revision: https://reviews.llvm.org/D30442 llvm-svn: 296506	2017-02-28 19:20:33 +00:00

1 2 3 4 5 ...

1626 Commits