1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-23 04:52:54 +02:00
Commit Graph

139 Commits

Author SHA1 Message Date
Matt Arsenault
8858f865a6 AMDGPU: Add macro fusion schedule DAG mutation
Try to increase opportunities to shrink vcc uses.

llvm-svn: 307313
2017-07-06 20:57:05 +00:00
Quentin Colombet
053ca5f07d [AMDGPU] Move GISel accessor initialization from TargetMachine to Subtarget.
NFC

llvm-svn: 307186
2017-07-05 18:40:56 +00:00
Alexander Timofeev
71cf453d98 [AMDGPU] Switch scalarize global loads ON by default
Differential revision: https://reviews.llvm.org/D34407

llvm-svn: 307097
2017-07-04 17:32:00 +00:00
NAKAMURA Takumi
e6c7524092 Revert r307026, "[AMDGPU] Switch scalarize global loads ON by default"
It broke a testcase.

  Failing Tests (1):
      LLVM :: CodeGen/AMDGPU/alignbit-pat.ll

llvm-svn: 307054
2017-07-04 02:14:18 +00:00
Alexander Timofeev
ef2bf78d2f [AMDGPU] Switch scalarize global loads ON by default
Differential revision: https://reviews.llvm.org/D34407

llvm-svn: 307026
2017-07-03 14:54:11 +00:00
Matt Arsenault
9eac763e7b AMDGPU: Remove SITypeRewriter
This was an old workaround for using v16i8 in some old intrinsics
for resource descriptors.

llvm-svn: 306603
2017-06-28 21:38:50 +00:00
Stanislav Mekhanoshin
a2f5f5f7c9 [AMDGPU] Add infer address spaces pass before SROA
It adds it for the target after inlining but before SROA where
we can get most out of it.

Differential Revision: https://reviews.llvm.org/D34366

llvm-svn: 305759
2017-06-19 23:17:36 +00:00
Chandler Carruth
eb66b33867 Sort the remaining #include lines in include/... and lib/....
I did this a long time ago with a janky python script, but now
clang-format has built-in support for this. I fed clang-format every
line with a #include and let it re-sort things according to the precise
LLVM rules for include ordering baked into clang-format these days.

I've reverted a number of files where the results of sorting includes
isn't healthy. Either places where we have legacy code relying on
particular include ordering (where possible, I'll fix these separately)
or where we have particular formatting around #include lines that
I didn't want to disturb in this patch.

This patch is *entirely* mechanical. If you get merge conflicts or
anything, just ignore the changes in this patch and run clang-format
over your #include lines in the files.

Sorry for any noise here, but it is important to keep these things
stable. I was seeing an increasing number of patches with irrelevant
re-ordering of #include lines because clang-format was used. This patch
at least isolates that churn, makes it easy to skip when resolving
conflicts, and gets us to a clean baseline (again).

llvm-svn: 304787
2017-06-06 11:49:48 +00:00
Stanislav Mekhanoshin
9cf9511d2e [AMDGPU] Untangle SDWA pass from SIShrinkInstructions
Remove dependency of SDWA pass on SIShrinkInstructions.
The goal is to move SDWA even higher in the stack to avoid second run
of MachineLICM, MachineCSE and SIFoldOperands.

Also added handling to preserve original src modifiers.

Differential Revision: https://reviews.llvm.org/D33860

llvm-svn: 304665
2017-06-03 17:39:47 +00:00
Matt Arsenault
cac04b0f5e AMDGPU: Register AMDGPUAlwaysInline
llvm-svn: 304574
2017-06-02 18:02:42 +00:00
Mark Searles
ecdf6429da [AMDGPU] Turn on the new waitcnt insertion pass. Adjust tests.
-enable-si-insert-waitcnts=1 becomes the default
-enable-si-insert-waitcnts=0 to use old pass

Differential Revision: https://reviews.llvm.org/D33730

llvm-svn: 304551
2017-06-02 14:19:25 +00:00
Matthias Braun
7b8d690af1 TargetPassConfig: Keep a reference to an LLVMTargetMachine; NFC
TargetPassConfig is not useful for targets that do not use the CodeGen
library, so we may just as well store a pointer to an
LLVMTargetMachine instead of just to a TargetMachine.

While at it, also change the constructor to take a reference instead of a
pointer as the TM must not be nullptr.

llvm-svn: 304247
2017-05-30 21:36:41 +00:00
Stanislav Mekhanoshin
e23fa40f7f [AMDGPU] Allow SDWA in instructions with immediates and SGPRs
An encoding does not allow to use SDWA in an instruction with
scalar operands, either literals or SGPRs. That is however possible
to copy these operands into a VGPR first.

Several copies of the value are produced if multiple SDWA conversions
were done. To cleanup MachineLICM (to hoist copies out of loops),
MachineCSE (to remove duplicate copies) and SIFoldOperands (to replace
SGPR to VGPR copy with immediate copy right to the VGPR) runs are added
after the SDWA pass.

Differential Revision: https://reviews.llvm.org/D33583

llvm-svn: 304219
2017-05-30 16:49:24 +00:00
Francis Visoiu Mistrih
5f6c901f02 [LegacyPassManager] Remove TargetMachine constructors
This provides a new way to access the TargetMachine through
TargetPassConfig, as a dependency.

The patterns replaced here are:

* Passes handling a null TargetMachine call
  `getAnalysisIfAvailable<TargetPassConfig>`.

* Passes not handling a null TargetMachine
  `addRequired<TargetPassConfig>` and call
  `getAnalysis<TargetPassConfig>`.

* MachineFunctionPasses now use MF.getTarget().

* Remove all the TargetMachine constructors.
* Remove INITIALIZE_TM_PASS.

This fixes a crash when running `llc -start-before prologepilog`.

PEI needs StackProtector, which gets constructed without a TargetMachine
by the pass manager. The StackProtector pass doesn't handle the case
where there is no TargetMachine, so it segfaults.

Related to PR30324.

Differential Revision: https://reviews.llvm.org/D33222

llvm-svn: 303360
2017-05-18 17:21:13 +00:00
Jan Sjodin
6fb09ca5d3 Re-submit AMDGPUMachineCFGStructurizer.
Differential Revision: https://reviews.llvm.org/D23209

llvm-svn: 303111
2017-05-15 20:18:37 +00:00
Jan Sjodin
4f52af0e05 Revert 303091.
llvm-svn: 303098
2017-05-15 18:39:47 +00:00
Jan Sjodin
53e05436a9 Add AMDGPUMachineCFGStructurizer.
Differential Revision: https://reviews.llvm.org/D23209

llvm-svn: 303091
2017-05-15 18:13:56 +00:00
Marek Olsak
740cda5978 AMDGPU: Add AMDGPU_HS calling convention
Reviewers: arsenm, nhaehnle

Subscribers: mehdi_amini, kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D32644

llvm-svn: 301930
2017-05-02 15:41:10 +00:00
Stanislav Mekhanoshin
51dba4fa40 [AMDGPU] Generate range metadata for workitem id
If workgroup size is known inform llvm about range returned by local
id  and local size queries.

Differential Revision: https://reviews.llvm.org/D31804

llvm-svn: 300102
2017-04-12 20:48:56 +00:00
Kannan Narayanan
fd367b26fb [AMDGPU] Add a new pass to insert waitcnts. Leave under an option for testing.
Based on comments in https://reviews.llvm.org/D31161.

llvm-svn: 300023
2017-04-12 03:25:12 +00:00
Yaxun Liu
f386491b0f [AMDGPU] Add A5 to data layout for amdgiz environment
Differential Revision: https://reviews.llvm.org/D31589

llvm-svn: 299964
2017-04-11 17:18:13 +00:00
Sam Kolton
5871a5f1d7 [AMDGPU] Move SiShrinkInstruction and SDWAPeephole to SSAOptimization passes
Summary:
Difference beetween PreRegAlloc() and MachineSSAOptimization() are that the former is run despite of -O0 optimization level. In my undestanding SiShrinkInstructions and SDWAPeephole shouldn't run when optimizations are disabled.
With this change order of passes will not change.

Reviewers: arsenm, vpykhtin, rampitec

Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D31705

llvm-svn: 299757
2017-04-07 10:53:12 +00:00
Yaxun Liu
3bf809b0e4 [AMDGPU] Temporarily change constant address space from 4 to 2
Our final address space mapping is to let constant address space to be 4 to match nvptx.
However for now we will make it 2 to avoid unnecessary work in FE/BE/devlib
about intrinsics returning constant pointers.

Differential Revision: https://reviews.llvm.org/D31770

llvm-svn: 299690
2017-04-06 19:17:32 +00:00
Sam Kolton
0b70fd1739 [AMDGPU] Resubmit SDWA peephole: enable by default
Reviewers: vpykhtin, rampitec, arsenm

Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D31671

llvm-svn: 299654
2017-04-06 15:03:28 +00:00
Ivan Krasin
f25940dbd7 Revert r299536. [AMDGPU] SDWA peephole: enable by default.
Reason: breaks multiple bots:

http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/3988
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/1173

Original Review URL: https://reviews.llvm.org/D31671

llvm-svn: 299583
2017-04-05 19:58:12 +00:00
Sam Kolton
5469f4b9e9 [AMDGPU] SDWA peephole: enable by default
Reviewers: vpykhtin, rampitec, arsenm

Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D31671

llvm-svn: 299536
2017-04-05 12:00:45 +00:00
Stanislav Mekhanoshin
c0ea8a08d6 [AMDGPU] Add GlobalOpt parameter to Always Inliner pass
If set to false it does not remove global aliases. With this parameter
set to false it should be safe to run the pass before link.

Differential Revision: https://reviews.llvm.org/D31489

llvm-svn: 299108
2017-03-30 20:16:02 +00:00
Stanislav Mekhanoshin
cd4bc62dff [AMDGPU] Split -amdgpu-early-inline-all option
Previously it was covered by the internalization. It turns out we cannot
run internalizer in FE, it break separate compilation tests. Thus early
inliner gets its own option.

Differential Revision: https://reviews.llvm.org/D31429

llvm-svn: 298935
2017-03-28 18:23:24 +00:00
Yaxun Liu
da52f0e643 [AMDGPU] Get address space mapping by target triple environment
As we introduced target triple environment amdgiz and amdgizcl, the address
space values are no longer enums. We have to decide the value by target triple.

The basic idea is to use struct AMDGPUAS to represent address space values.
For address space values which are not depend on target triple, use static
const members, so that they don't occupy extra memory space and is equivalent
to a compile time constant.

Since the struct is lightweight and cheap, it can be created on the fly at
the point of usage. Or it can be added as member to a pass and created at
the beginning of the run* function.

Differential Revision: https://reviews.llvm.org/D31284

llvm-svn: 298846
2017-03-27 14:04:01 +00:00
Yaxun Liu
616a824961 [AMDGPU] Switch data layout by triple environment amdgiz
Switch data layout by target triple environment amdgiz and amdgizcl indicating using of an address space mapping in which generic address space is 0.

amdgiz is for non-OpenCL environment where generic address space is 0.

amdgizcl is for OpenCL environment where generic address space is 0.

Differential Revision: https://reviews.llvm.org/D31211

llvm-svn: 298758
2017-03-25 02:05:44 +00:00
Matt Arsenault
594fa2719f AMDGPU: Unify divergent function exits.
StructurizeCFG can't handle cases with multiple
returns creating regions with multiple exits.
Create a copy of UnifyFunctionExitNodes that only
unifies exit nodes that skips exit nodes
with uniform branch sources.

llvm-svn: 298729
2017-03-24 19:52:05 +00:00
Stanislav Mekhanoshin
3bf0017558 [AMDGPU] Add AMDGPUAliasAnalysis to opt pipeline
Previously it was added only to the BE.

Differential Revision: https://reviews.llvm.org/D31323

llvm-svn: 298721
2017-03-24 18:01:14 +00:00
Valery Pykhtin
fd341e28f3 [AMDGPU] Iterative scheduling infrastructure + minimal registry scheduler
Differential revision: https://reviews.llvm.org/D31046

llvm-svn: 298368
2017-03-21 13:15:46 +00:00
Sam Kolton
fcb49c3b8d [ADMGPU] SDWA peephole optimization pass.
Summary:
First iteration of SDWA peephole.

This pass tries to combine several instruction into one SDWA instruction. E.g. it converts:
'''
    V_LSHRREV_B32_e32 %vreg0, 16, %vreg1
    V_ADD_I32_e32 %vreg2, %vreg0, %vreg3
    V_LSHLREV_B32_e32 %vreg4, 16, %vreg2
'''
Into:
'''
   V_ADD_I32_sdwa %vreg4, %vreg1, %vreg3 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
'''

Pass structure:
    1. Iterate over machine instruction in basic block and try to apply "SDWA patterns" to each of them. SDWA patterns match machine instruction into either source or destination SDWA operand. E.g. ''' V_LSHRREV_B32_e32 %vreg0, 16, %vreg1''' is matched to source SDWA operand '''%vreg1 src_sel:WORD_1'''.
    2. Iterate over found SDWA operands and find instruction that could be potentially coverted into SDWA. E.g. for source SDWA operand potential instruction are all instruction in this basic block that uses '''%vreg0'''
    3. Iterate over all potential instructions and check if they can be converted into SDWA.
    4. Convert instructions to SDWA.

This review contains basic implementation of SDWA peephole pass. This pass requires additional testing fot both correctness and performance (no performance testing done).
There are several ways this pass can be improved:
    1. Make this pass work on whole function not only basic block. As I can see this can be done right now without changes to pass.
    2. Introduce more SDWA patterns
    3. Introduce mnemonics to limit when SDWA patterns should apply

Reviewers: vpykhtin, alex-t, arsenm, rampitec

Subscribers: wdng, nhaehnle, mgorny

Differential Revision: https://reviews.llvm.org/D30038

llvm-svn: 298365
2017-03-21 12:51:34 +00:00
Konstantin Zhuravlyov
c2d33a361e [AMDGPU] Run always inliner early in opt
Differential Revision: https://reviews.llvm.org/D31141

llvm-svn: 298281
2017-03-20 18:06:45 +00:00
Konstantin Zhuravlyov
6fe40181b8 Revert "[AMDGPU] Run always inliner early in opt"
This reverts commit r297958, it breaks device-libs build.

llvm-svn: 298239
2017-03-20 09:26:08 +00:00
Stanislav Mekhanoshin
123733906c [AMDGPU] Add address space based alias analysis pass
This is direct port of HSAILAliasAnalysis pass, just cleaned for
style and renamed.

Differential Revision: https://reviews.llvm.org/D31103

llvm-svn: 298172
2017-03-17 23:56:58 +00:00
Stanislav Mekhanoshin
1c042b3ab9 Only unswitch loops with uniform conditions
Loop unswitching can be extremely harmful for a SIMT target. In case
if hoisted condition is not uniform a SIMT machine will execute both
clones of a loop sequentially. Therefor LoopUnswitch checks if the
condition is non-divergent.

Since DivergenceAnalysis adds an expensive PostDominatorTree analysis
not needed for non-SIMT targets a new option is added to avoid unneded
analysis initialization. The method getAnalysisUsage is called when
TargetTransformInfo is not yet available and we cannot use it here.
For that reason a new field DivergentTarget is added to PassManagerBuilder
to control the behavior and set this field from a target.

Differential Revision: https://reviews.llvm.org/D30796

llvm-svn: 298104
2017-03-17 17:13:41 +00:00
Stanislav Mekhanoshin
e7e6d76e45 [AMDGPU] Run always inliner early in opt
We can mark functions to always inline early in the opt. Since we do not have
call support this early inlining creates opportunities for inter-procedural
optimizations which would not occur otherwise.

Differential Revision: https://reviews.llvm.org/D31016

llvm-svn: 297958
2017-03-16 16:11:46 +00:00
Matt Arsenault
a207e31c14 AMDGPU: Merge initial gfx9 support
llvm-svn: 295554
2017-02-18 18:29:53 +00:00
Stanislav Mekhanoshin
b83595fd3c [AMDGPU] Revert failed scheduling
This patch reverts region's scheduling to the original untouched state
in case if we have have decreased occupancy.

In addition it switches to use TargetRegisterInfo occupancy callback
for pressure limits instead of gradually increasing limits which were
just passed by. We are going to stay with the best schedule so we do
not need to tolerate worsened scheduling anymore.

Differential Revision: https://reviews.llvm.org/D29971

llvm-svn: 295206
2017-02-15 17:19:50 +00:00
Matt Arsenault
478a09d3d1 AMDGPU: Add pass to expand memcpy/memmove/memset
llvm-svn: 294635
2017-02-09 22:00:42 +00:00
Matt Arsenault
15bde0a3d5 AMDGPU: Enable InferAddressSpaces
llvm-svn: 294408
2017-02-08 06:16:04 +00:00
Tom Stellard
f2ec17e0e6 Re-commit AMDGPU/GlobalISel: Add support for simple shaders
Fix build when global-isel is disabled and fix a warning.

Summary: We can select constant/global G_LOAD, global G_STORE, and G_GEP.

Reviewers: qcolombet, MatzeB, t.p.northover, ab, arsenm

Subscribers: mehdi_amini, vkalintiris, kzhuravl, wdng, nhaehnle, mgorny, yaxunl, tony-tye, modocache, llvm-commits, dberris

Differential Revision: https://reviews.llvm.org/D26730

llvm-svn: 293551
2017-01-30 21:56:46 +00:00
Stanislav Mekhanoshin
0a8e20606c [AMDGPU] Internalize non-kernel symbols
Since we have no call support and late linking we can produce code
only for used symbols. This saves compilation time, size of the final
executable, and size of any intermediate dumps.

Run Internalize pass early in the opt pipeline followed by global
DCE pass. To enable it RT can pass -amdgpu-internalize-symbols option.

Differential Revision: https://reviews.llvm.org/D29214

llvm-svn: 293549
2017-01-30 21:05:18 +00:00
Matt Arsenault
335594976a AMDGPU: Run AMDGPUCodeGenPrepare after inlining
With leaf functions, this makes nonsensical decisions
based on the uniformity of the arguments.

llvm-svn: 293525
2017-01-30 18:40:29 +00:00
Tom Stellard
d839aa304c Revert "AMDGPU/GlobalISel: Add support for simple shaders"
This reverts commit r293503.

Revert while I investigate some of the buildbot failures.

llvm-svn: 293509
2017-01-30 17:42:41 +00:00
Tom Stellard
ca8f087f31 AMDGPU/GlobalISel: Add support for simple shaders
Summary: We can select constant/global G_LOAD, global G_STORE, and G_GEP.

Reviewers: qcolombet, MatzeB, t.p.northover, ab, arsenm

Subscribers: mehdi_amini, vkalintiris, kzhuravl, wdng, nhaehnle, mgorny, yaxunl, tony-tye, modocache, llvm-commits, dberris

Differential Revision: https://reviews.llvm.org/D26730

llvm-svn: 293503
2017-01-30 17:09:15 +00:00
Stanislav Mekhanoshin
32e634e989 [AMDGPU] Turn AMDGPUUnifyMetadata back into module pass
With the adjustPassManager interface that is now possible to use
custom early module passes.

Differential Revision: https://reviews.llvm.org/D29189

llvm-svn: 293300
2017-01-27 16:38:10 +00:00
Stanislav Mekhanoshin
4b31377e87 Replace addEarlyAsPossiblePasses callback with adjustPassManager
This change introduces adjustPassManager target callback giving a
target an opportunity to tweak PassManagerBuilder before pass
managers are populated.

This generalizes and replaces addEarlyAsPossiblePasses target
callback. In particular that can be used to add custom passes to
extension points other than EP_EarlyAsPossible.

Differential Revision: https://reviews.llvm.org/D28336

llvm-svn: 293189
2017-01-26 16:49:08 +00:00