1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-01-31 20:51:52 +01:00

20 Commits

Author SHA1 Message Date
Stanislav Mekhanoshin
9363e58d6d [AMDGPU] Add gfx1030 target
Differential Revision: https://reviews.llvm.org/D81886
2020-06-15 16:18:05 -07:00
Matt Arsenault
40d3e7765c AMDGPU: Remove denormal subtarget features
Switch to using the denormal-fp-math/denormal-fp-math-f32 attributes.
2020-04-02 17:17:12 -04:00
Matt Arsenault
8bdcb1c2a9 AMDGPU: Assume f32 denormals are enabled by default
This will likely introduce catastrophic performance regressions on
older subtargets, but should be correct. A follow up change will
remove the old fp32-denormals subtarget features, and switch to using
the new denormal-fp-math/denormal-fp-math-f32 attributes. Frontends
should be making sure to add the denormal-fp-math-f32 attribute when
appropriate to avoid performance regressions.
2020-04-02 17:17:12 -04:00
Tim Renouf
10fe28f702 [AMDGPU] Fix some tests that did not specify -mcpu
Summary:
This fixes some tests that did not specify -mcpu. Doing that disables
all subtarget features, which gives behavior that (a) does not
necessarily correspond to any actual target, and (b) can change as we
add new subtarget features.

Also added gfx1010 to memtime test.

Differential Revision: https://reviews.llvm.org/D74594

Change-Id: I8c0fe4fa03e9a93ef8bb722cd42d22e064526309
2020-02-17 14:02:32 +00:00
Tim Renouf
c57b91da41 [AMDGPU] Support for v3i32/v3f32
Added support for dwordx3 for most load/store types, but not DS, and not
intrinsics yet.

SI (gfx6) does not have dwordx3 instructions, so they are not enabled
there.

Some of this patch is from Matt Arsenault, also of AMD.

Differential Revision: https://reviews.llvm.org/D58902

Change-Id: I913ef54f1433a7149da8d72f4af54dbb13436bd9
llvm-svn: 356659
2019-03-21 12:01:21 +00:00
Matt Arsenault
7ead0f3ed6 AMDGPU: Allow SIShrinkInstructions to work in non-SSA
Immediates can be folded as long as the immediate is a vreg.

Also undo commuting instructions if it didn't fold an immediate.

llvm-svn: 307575
2017-07-10 19:53:57 +00:00
Alexander Timofeev
71cf453d98 [AMDGPU] Switch scalarize global loads ON by default
Differential revision: https://reviews.llvm.org/D34407

llvm-svn: 307097
2017-07-04 17:32:00 +00:00
NAKAMURA Takumi
e6c7524092 Revert r307026, "[AMDGPU] Switch scalarize global loads ON by default"
It broke a testcase.

  Failing Tests (1):
      LLVM :: CodeGen/AMDGPU/alignbit-pat.ll

llvm-svn: 307054
2017-07-04 02:14:18 +00:00
Alexander Timofeev
ef2bf78d2f [AMDGPU] Switch scalarize global loads ON by default
Differential revision: https://reviews.llvm.org/D34407

llvm-svn: 307026
2017-07-03 14:54:11 +00:00
Matt Arsenault
dd9ab77318 AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel
Currently the default C calling convention functions are treated
the same as compute kernels. Make this explicit so the default
calling convention can be changed to a non-kernel.

Converted with perl -pi -e 's/define void/define amdgpu_kernel void/'
on the relevant test directories (and undoing in one place that actually
wanted a non-kernel).

llvm-svn: 298444
2017-03-21 21:39:51 +00:00
Matt Arsenault
2d30bf1fc6 DAG: Check no signed zeros instead of unsafe math attribute
llvm-svn: 297354
2017-03-09 01:36:39 +00:00
Matt Arsenault
c4ccc9b791 DAG: Constant fold fp16_to_fp/fp16_to_fp
This fixes emitting conversions of constants on targets
without legal f16 that need to use these for legalization.

llvm-svn: 293499
2017-01-30 16:57:41 +00:00
Matt Arsenault
81a9bfe915 Enable FeatureFlatForGlobal on Volcanic Islands
This switches to the workaround that HSA defaults to
for the mesa path.

This should be applied to the 4.0 branch.

Patch by Vedran Miletić <vedran@miletic.net>

llvm-svn: 292982
2017-01-24 22:02:15 +00:00
Matt Arsenault
bd33194651 AMDGPU: Combine fp16/fp64 subtarget features
The same control register controls both, and are set to
the same defaults. Keep the old names around as aliases.

llvm-svn: 292837
2017-01-23 22:31:03 +00:00
Matt Arsenault
f781954dd4 AMDGPU: Fix folding immediates into mac src2
Whether it is legal or not needs to check for the instruction
it will be replaced with.

llvm-svn: 291711
2017-01-11 22:00:02 +00:00
Matt Arsenault
c37f15f5b5 AMDGPU: Remove superfluous string attributes from tests
Also fix v_mac.ll not testing right thing for fneg

llvm-svn: 275129
2016-07-11 23:35:48 +00:00
Nikolay Haustov
048a920e0e AMDGPU/SI: Assembler: Unify parsing/printing of operands.
Summary:
The goal is for each operand type to have its own parse function and
at the same time share common code for tracking state as different
instruction types share operand types (e.g. glc/glc_flat, etc).

Introduce parseAMDGPUOperand which can parse any optional operand.
DPP and Clamp/OMod have custom handling for now. Sam also suggested
to have class hierarchy for operand types instead of table. This
can be done in separate change.

Remove parseVOP3OptionalOps, parseDS*OptionalOps, parseFlatOptionalOps,
parseMubufOptionalOps, parseDPPOptionalOps.
Reduce number of definitions of AsmOperand's and MatchClasses' by using common base class.
Rename AsmMatcher/InstPrinter methods accordingly.
Print immediate type when printing parsed immediate operand.
Use 'off' if offset/index register is unused instead of skipping it to make it more readable (also agreed with SP3).
Update tests.

Reviewers: tstellarAMD, SamWot, artem.tamazov

Subscribers: qcolombet, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19584

llvm-svn: 268015
2016-04-29 09:02:30 +00:00
Matt Arsenault
2dfb6d03c5 AMDGPU: Run SIFoldOperands after PeepholeOptimizer
PeepholeOptimizer cleans up redundant copies, which makes
the operand folding more effective.

shader-db stats:

Totals:
SGPRS: 34200 -> 34336 (0.40 %)
VGPRS: 22118 -> 21655 (-2.09 %)
Code Size: 632144 -> 633460 (0.21 %) bytes
LDS: 11 -> 11 (0.00 %) blocks
Scratch: 10240 -> 11264 (10.00 %) bytes per wave
Max Waves: 8822 -> 8918 (1.09 %)
Wait states: 0 -> 0 (0.00 %)

Totals from affected shaders:
SGPRS: 7704 -> 7840 (1.77 %)
VGPRS: 5169 -> 4706 (-8.96 %)
Code Size: 234444 -> 235760 (0.56 %) bytes
LDS: 2 -> 2 (0.00 %) blocks
Scratch: 0 -> 1024 (0.00 %) bytes per wave
Max Waves: 1188 -> 1284 (8.08 %)
Wait states: 0 -> 0 (0.00 %)

Increases:
SGPRS: 35 (0.01 %)
VGPRS: 1 (0.00 %)
Code Size: 59 (0.02 %)
LDS: 0 (0.00 %)
Scratch: 1 (0.00 %)
Max Waves: 48 (0.02 %)
Wait states: 0 (0.00 %)

Decreases:
SGPRS: 26 (0.01 %)
VGPRS: 54 (0.02 %)
Code Size: 68 (0.03 %)
LDS: 0 (0.00 %)
Scratch: 0 (0.00 %)
Max Waves: 4 (0.00 %)
Wait states: 0 (0.00 %)

llvm-svn: 266378
2016-04-14 21:58:24 +00:00
Matt Arsenault
7193bf43c5 AMDGPU: Add volatile to test loads and stores
When the memory vectorizer is enabled, these tests break.
These tests don't really care about the memory instructions,
and it's easier to write check lines with the unmerged loads.

llvm-svn: 266071
2016-04-12 13:38:18 +00:00
Tom Stellard
24371a62d2 AMDGPU/SI: Select mad patterns to v_mac_f32
The two-address instruction pass will convert these back to v_mad_f32
if necessary.

Differential Revision: http://reviews.llvm.org/D11060

llvm-svn: 242038
2015-07-13 15:47:57 +00:00