llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-23 04:52:54 +02:00

Author	SHA1	Message	Date
Matt Arsenault	8858f865a6	AMDGPU: Add macro fusion schedule DAG mutation Try to increase opportunities to shrink vcc uses. llvm-svn: 307313	2017-07-06 20:57:05 +00:00
Quentin Colombet	053ca5f07d	[AMDGPU] Move GISel accessor initialization from TargetMachine to Subtarget. NFC llvm-svn: 307186	2017-07-05 18:40:56 +00:00
Alexander Timofeev	71cf453d98	[AMDGPU] Switch scalarize global loads ON by default Differential revision: https://reviews.llvm.org/D34407 llvm-svn: 307097	2017-07-04 17:32:00 +00:00
NAKAMURA Takumi	e6c7524092	Revert r307026, "[AMDGPU] Switch scalarize global loads ON by default" It broke a testcase. Failing Tests (1): LLVM :: CodeGen/AMDGPU/alignbit-pat.ll llvm-svn: 307054	2017-07-04 02:14:18 +00:00
Alexander Timofeev	ef2bf78d2f	[AMDGPU] Switch scalarize global loads ON by default Differential revision: https://reviews.llvm.org/D34407 llvm-svn: 307026	2017-07-03 14:54:11 +00:00
Matt Arsenault	9eac763e7b	AMDGPU: Remove SITypeRewriter This was an old workaround for using v16i8 in some old intrinsics for resource descriptors. llvm-svn: 306603	2017-06-28 21:38:50 +00:00
Stanislav Mekhanoshin	a2f5f5f7c9	[AMDGPU] Add infer address spaces pass before SROA It adds it for the target after inlining but before SROA where we can get most out of it. Differential Revision: https://reviews.llvm.org/D34366 llvm-svn: 305759	2017-06-19 23:17:36 +00:00
Chandler Carruth	eb66b33867	Sort the remaining #include lines in include/... and lib/.... I did this a long time ago with a janky python script, but now clang-format has built-in support for this. I fed clang-format every line with a #include and let it re-sort things according to the precise LLVM rules for include ordering baked into clang-format these days. I've reverted a number of files where the results of sorting includes isn't healthy. Either places where we have legacy code relying on particular include ordering (where possible, I'll fix these separately) or where we have particular formatting around #include lines that I didn't want to disturb in this patch. This patch is entirely mechanical. If you get merge conflicts or anything, just ignore the changes in this patch and run clang-format over your #include lines in the files. Sorry for any noise here, but it is important to keep these things stable. I was seeing an increasing number of patches with irrelevant re-ordering of #include lines because clang-format was used. This patch at least isolates that churn, makes it easy to skip when resolving conflicts, and gets us to a clean baseline (again). llvm-svn: 304787	2017-06-06 11:49:48 +00:00
Stanislav Mekhanoshin	9cf9511d2e	[AMDGPU] Untangle SDWA pass from SIShrinkInstructions Remove dependency of SDWA pass on SIShrinkInstructions. The goal is to move SDWA even higher in the stack to avoid second run of MachineLICM, MachineCSE and SIFoldOperands. Also added handling to preserve original src modifiers. Differential Revision: https://reviews.llvm.org/D33860 llvm-svn: 304665	2017-06-03 17:39:47 +00:00
Matt Arsenault	cac04b0f5e	AMDGPU: Register AMDGPUAlwaysInline llvm-svn: 304574	2017-06-02 18:02:42 +00:00
Mark Searles	ecdf6429da	[AMDGPU] Turn on the new waitcnt insertion pass. Adjust tests. -enable-si-insert-waitcnts=1 becomes the default -enable-si-insert-waitcnts=0 to use old pass Differential Revision: https://reviews.llvm.org/D33730 llvm-svn: 304551	2017-06-02 14:19:25 +00:00
Matthias Braun	7b8d690af1	TargetPassConfig: Keep a reference to an LLVMTargetMachine; NFC TargetPassConfig is not useful for targets that do not use the CodeGen library, so we may just as well store a pointer to an LLVMTargetMachine instead of just to a TargetMachine. While at it, also change the constructor to take a reference instead of a pointer as the TM must not be nullptr. llvm-svn: 304247	2017-05-30 21:36:41 +00:00
Stanislav Mekhanoshin	e23fa40f7f	[AMDGPU] Allow SDWA in instructions with immediates and SGPRs An encoding does not allow to use SDWA in an instruction with scalar operands, either literals or SGPRs. That is however possible to copy these operands into a VGPR first. Several copies of the value are produced if multiple SDWA conversions were done. To cleanup MachineLICM (to hoist copies out of loops), MachineCSE (to remove duplicate copies) and SIFoldOperands (to replace SGPR to VGPR copy with immediate copy right to the VGPR) runs are added after the SDWA pass. Differential Revision: https://reviews.llvm.org/D33583 llvm-svn: 304219	2017-05-30 16:49:24 +00:00
Francis Visoiu Mistrih	5f6c901f02	[LegacyPassManager] Remove TargetMachine constructors This provides a new way to access the TargetMachine through TargetPassConfig, as a dependency. The patterns replaced here are: * Passes handling a null TargetMachine call `getAnalysisIfAvailable<TargetPassConfig>`. * Passes not handling a null TargetMachine `addRequired<TargetPassConfig>` and call `getAnalysis<TargetPassConfig>`. * MachineFunctionPasses now use MF.getTarget(). * Remove all the TargetMachine constructors. * Remove INITIALIZE_TM_PASS. This fixes a crash when running `llc -start-before prologepilog`. PEI needs StackProtector, which gets constructed without a TargetMachine by the pass manager. The StackProtector pass doesn't handle the case where there is no TargetMachine, so it segfaults. Related to PR30324. Differential Revision: https://reviews.llvm.org/D33222 llvm-svn: 303360	2017-05-18 17:21:13 +00:00
Jan Sjodin	6fb09ca5d3	Re-submit AMDGPUMachineCFGStructurizer. Differential Revision: https://reviews.llvm.org/D23209 llvm-svn: 303111	2017-05-15 20:18:37 +00:00
Jan Sjodin	4f52af0e05	Revert 303091. llvm-svn: 303098	2017-05-15 18:39:47 +00:00
Jan Sjodin	53e05436a9	Add AMDGPUMachineCFGStructurizer. Differential Revision: https://reviews.llvm.org/D23209 llvm-svn: 303091	2017-05-15 18:13:56 +00:00
Marek Olsak	740cda5978	AMDGPU: Add AMDGPU_HS calling convention Reviewers: arsenm, nhaehnle Subscribers: mehdi_amini, kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D32644 llvm-svn: 301930	2017-05-02 15:41:10 +00:00
Stanislav Mekhanoshin	51dba4fa40	[AMDGPU] Generate range metadata for workitem id If workgroup size is known inform llvm about range returned by local id and local size queries. Differential Revision: https://reviews.llvm.org/D31804 llvm-svn: 300102	2017-04-12 20:48:56 +00:00
Kannan Narayanan	fd367b26fb	[AMDGPU] Add a new pass to insert waitcnts. Leave under an option for testing. Based on comments in https://reviews.llvm.org/D31161. llvm-svn: 300023	2017-04-12 03:25:12 +00:00
Yaxun Liu	f386491b0f	[AMDGPU] Add A5 to data layout for amdgiz environment Differential Revision: https://reviews.llvm.org/D31589 llvm-svn: 299964	2017-04-11 17:18:13 +00:00
Sam Kolton	5871a5f1d7	[AMDGPU] Move SiShrinkInstruction and SDWAPeephole to SSAOptimization passes Summary: Difference beetween PreRegAlloc() and MachineSSAOptimization() are that the former is run despite of -O0 optimization level. In my undestanding SiShrinkInstructions and SDWAPeephole shouldn't run when optimizations are disabled. With this change order of passes will not change. Reviewers: arsenm, vpykhtin, rampitec Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D31705 llvm-svn: 299757	2017-04-07 10:53:12 +00:00
Yaxun Liu	3bf809b0e4	[AMDGPU] Temporarily change constant address space from 4 to 2 Our final address space mapping is to let constant address space to be 4 to match nvptx. However for now we will make it 2 to avoid unnecessary work in FE/BE/devlib about intrinsics returning constant pointers. Differential Revision: https://reviews.llvm.org/D31770 llvm-svn: 299690	2017-04-06 19:17:32 +00:00
Sam Kolton	0b70fd1739	[AMDGPU] Resubmit SDWA peephole: enable by default Reviewers: vpykhtin, rampitec, arsenm Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D31671 llvm-svn: 299654	2017-04-06 15:03:28 +00:00
Ivan Krasin	f25940dbd7	Revert r299536. [AMDGPU] SDWA peephole: enable by default. Reason: breaks multiple bots: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/3988 http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/1173 Original Review URL: https://reviews.llvm.org/D31671 llvm-svn: 299583	2017-04-05 19:58:12 +00:00
Sam Kolton	5469f4b9e9	[AMDGPU] SDWA peephole: enable by default Reviewers: vpykhtin, rampitec, arsenm Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D31671 llvm-svn: 299536	2017-04-05 12:00:45 +00:00
Stanislav Mekhanoshin	c0ea8a08d6	[AMDGPU] Add GlobalOpt parameter to Always Inliner pass If set to false it does not remove global aliases. With this parameter set to false it should be safe to run the pass before link. Differential Revision: https://reviews.llvm.org/D31489 llvm-svn: 299108	2017-03-30 20:16:02 +00:00
Stanislav Mekhanoshin	cd4bc62dff	[AMDGPU] Split -amdgpu-early-inline-all option Previously it was covered by the internalization. It turns out we cannot run internalizer in FE, it break separate compilation tests. Thus early inliner gets its own option. Differential Revision: https://reviews.llvm.org/D31429 llvm-svn: 298935	2017-03-28 18:23:24 +00:00
Yaxun Liu	da52f0e643	[AMDGPU] Get address space mapping by target triple environment As we introduced target triple environment amdgiz and amdgizcl, the address space values are no longer enums. We have to decide the value by target triple. The basic idea is to use struct AMDGPUAS to represent address space values. For address space values which are not depend on target triple, use static const members, so that they don't occupy extra memory space and is equivalent to a compile time constant. Since the struct is lightweight and cheap, it can be created on the fly at the point of usage. Or it can be added as member to a pass and created at the beginning of the run* function. Differential Revision: https://reviews.llvm.org/D31284 llvm-svn: 298846	2017-03-27 14:04:01 +00:00
Yaxun Liu	616a824961	[AMDGPU] Switch data layout by triple environment amdgiz Switch data layout by target triple environment amdgiz and amdgizcl indicating using of an address space mapping in which generic address space is 0. amdgiz is for non-OpenCL environment where generic address space is 0. amdgizcl is for OpenCL environment where generic address space is 0. Differential Revision: https://reviews.llvm.org/D31211 llvm-svn: 298758	2017-03-25 02:05:44 +00:00
Matt Arsenault	594fa2719f	AMDGPU: Unify divergent function exits. StructurizeCFG can't handle cases with multiple returns creating regions with multiple exits. Create a copy of UnifyFunctionExitNodes that only unifies exit nodes that skips exit nodes with uniform branch sources. llvm-svn: 298729	2017-03-24 19:52:05 +00:00
Stanislav Mekhanoshin	3bf0017558	[AMDGPU] Add AMDGPUAliasAnalysis to opt pipeline Previously it was added only to the BE. Differential Revision: https://reviews.llvm.org/D31323 llvm-svn: 298721	2017-03-24 18:01:14 +00:00
Valery Pykhtin	fd341e28f3	[AMDGPU] Iterative scheduling infrastructure + minimal registry scheduler Differential revision: https://reviews.llvm.org/D31046 llvm-svn: 298368	2017-03-21 13:15:46 +00:00
Sam Kolton	fcb49c3b8d	[ADMGPU] SDWA peephole optimization pass. Summary: First iteration of SDWA peephole. This pass tries to combine several instruction into one SDWA instruction. E.g. it converts: ''' V_LSHRREV_B32_e32 %vreg0, 16, %vreg1 V_ADD_I32_e32 %vreg2, %vreg0, %vreg3 V_LSHLREV_B32_e32 %vreg4, 16, %vreg2 ''' Into: ''' V_ADD_I32_sdwa %vreg4, %vreg1, %vreg3 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD ''' Pass structure: 1. Iterate over machine instruction in basic block and try to apply "SDWA patterns" to each of them. SDWA patterns match machine instruction into either source or destination SDWA operand. E.g. ''' V_LSHRREV_B32_e32 %vreg0, 16, %vreg1''' is matched to source SDWA operand '''%vreg1 src_sel:WORD_1'''. 2. Iterate over found SDWA operands and find instruction that could be potentially coverted into SDWA. E.g. for source SDWA operand potential instruction are all instruction in this basic block that uses '''%vreg0''' 3. Iterate over all potential instructions and check if they can be converted into SDWA. 4. Convert instructions to SDWA. This review contains basic implementation of SDWA peephole pass. This pass requires additional testing fot both correctness and performance (no performance testing done). There are several ways this pass can be improved: 1. Make this pass work on whole function not only basic block. As I can see this can be done right now without changes to pass. 2. Introduce more SDWA patterns 3. Introduce mnemonics to limit when SDWA patterns should apply Reviewers: vpykhtin, alex-t, arsenm, rampitec Subscribers: wdng, nhaehnle, mgorny Differential Revision: https://reviews.llvm.org/D30038 llvm-svn: 298365	2017-03-21 12:51:34 +00:00
Konstantin Zhuravlyov	c2d33a361e	[AMDGPU] Run always inliner early in opt Differential Revision: https://reviews.llvm.org/D31141 llvm-svn: 298281	2017-03-20 18:06:45 +00:00
Konstantin Zhuravlyov	6fe40181b8	Revert "[AMDGPU] Run always inliner early in opt" This reverts commit r297958, it breaks device-libs build. llvm-svn: 298239	2017-03-20 09:26:08 +00:00
Stanislav Mekhanoshin	123733906c	[AMDGPU] Add address space based alias analysis pass This is direct port of HSAILAliasAnalysis pass, just cleaned for style and renamed. Differential Revision: https://reviews.llvm.org/D31103 llvm-svn: 298172	2017-03-17 23:56:58 +00:00
Stanislav Mekhanoshin	1c042b3ab9	Only unswitch loops with uniform conditions Loop unswitching can be extremely harmful for a SIMT target. In case if hoisted condition is not uniform a SIMT machine will execute both clones of a loop sequentially. Therefor LoopUnswitch checks if the condition is non-divergent. Since DivergenceAnalysis adds an expensive PostDominatorTree analysis not needed for non-SIMT targets a new option is added to avoid unneded analysis initialization. The method getAnalysisUsage is called when TargetTransformInfo is not yet available and we cannot use it here. For that reason a new field DivergentTarget is added to PassManagerBuilder to control the behavior and set this field from a target. Differential Revision: https://reviews.llvm.org/D30796 llvm-svn: 298104	2017-03-17 17:13:41 +00:00
Stanislav Mekhanoshin	e7e6d76e45	[AMDGPU] Run always inliner early in opt We can mark functions to always inline early in the opt. Since we do not have call support this early inlining creates opportunities for inter-procedural optimizations which would not occur otherwise. Differential Revision: https://reviews.llvm.org/D31016 llvm-svn: 297958	2017-03-16 16:11:46 +00:00
Matt Arsenault	a207e31c14	AMDGPU: Merge initial gfx9 support llvm-svn: 295554	2017-02-18 18:29:53 +00:00
Stanislav Mekhanoshin	b83595fd3c	[AMDGPU] Revert failed scheduling This patch reverts region's scheduling to the original untouched state in case if we have have decreased occupancy. In addition it switches to use TargetRegisterInfo occupancy callback for pressure limits instead of gradually increasing limits which were just passed by. We are going to stay with the best schedule so we do not need to tolerate worsened scheduling anymore. Differential Revision: https://reviews.llvm.org/D29971 llvm-svn: 295206	2017-02-15 17:19:50 +00:00
Matt Arsenault	478a09d3d1	AMDGPU: Add pass to expand memcpy/memmove/memset llvm-svn: 294635	2017-02-09 22:00:42 +00:00
Matt Arsenault	15bde0a3d5	AMDGPU: Enable InferAddressSpaces llvm-svn: 294408	2017-02-08 06:16:04 +00:00
Tom Stellard	f2ec17e0e6	Re-commit AMDGPU/GlobalISel: Add support for simple shaders Fix build when global-isel is disabled and fix a warning. Summary: We can select constant/global G_LOAD, global G_STORE, and G_GEP. Reviewers: qcolombet, MatzeB, t.p.northover, ab, arsenm Subscribers: mehdi_amini, vkalintiris, kzhuravl, wdng, nhaehnle, mgorny, yaxunl, tony-tye, modocache, llvm-commits, dberris Differential Revision: https://reviews.llvm.org/D26730 llvm-svn: 293551	2017-01-30 21:56:46 +00:00
Stanislav Mekhanoshin	0a8e20606c	[AMDGPU] Internalize non-kernel symbols Since we have no call support and late linking we can produce code only for used symbols. This saves compilation time, size of the final executable, and size of any intermediate dumps. Run Internalize pass early in the opt pipeline followed by global DCE pass. To enable it RT can pass -amdgpu-internalize-symbols option. Differential Revision: https://reviews.llvm.org/D29214 llvm-svn: 293549	2017-01-30 21:05:18 +00:00
Matt Arsenault	335594976a	AMDGPU: Run AMDGPUCodeGenPrepare after inlining With leaf functions, this makes nonsensical decisions based on the uniformity of the arguments. llvm-svn: 293525	2017-01-30 18:40:29 +00:00
Tom Stellard	d839aa304c	Revert "AMDGPU/GlobalISel: Add support for simple shaders" This reverts commit r293503. Revert while I investigate some of the buildbot failures. llvm-svn: 293509	2017-01-30 17:42:41 +00:00
Tom Stellard	ca8f087f31	AMDGPU/GlobalISel: Add support for simple shaders Summary: We can select constant/global G_LOAD, global G_STORE, and G_GEP. Reviewers: qcolombet, MatzeB, t.p.northover, ab, arsenm Subscribers: mehdi_amini, vkalintiris, kzhuravl, wdng, nhaehnle, mgorny, yaxunl, tony-tye, modocache, llvm-commits, dberris Differential Revision: https://reviews.llvm.org/D26730 llvm-svn: 293503	2017-01-30 17:09:15 +00:00
Stanislav Mekhanoshin	32e634e989	[AMDGPU] Turn AMDGPUUnifyMetadata back into module pass With the adjustPassManager interface that is now possible to use custom early module passes. Differential Revision: https://reviews.llvm.org/D29189 llvm-svn: 293300	2017-01-27 16:38:10 +00:00
Stanislav Mekhanoshin	4b31377e87	Replace addEarlyAsPossiblePasses callback with adjustPassManager This change introduces adjustPassManager target callback giving a target an opportunity to tweak PassManagerBuilder before pass managers are populated. This generalizes and replaces addEarlyAsPossiblePasses target callback. In particular that can be used to add custom passes to extension points other than EP_EarlyAsPossible. Differential Revision: https://reviews.llvm.org/D28336 llvm-svn: 293189	2017-01-26 16:49:08 +00:00

1 2 3

139 Commits