1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-22 20:43:44 +02:00
Commit Graph

47921 Commits

Author SHA1 Message Date
Roman Lebedev
a0dac64487 [AMDGPU] Recognize x & ~(-1 << y) pattern.
Summary: The same pattern as D48010, but this one is IR-canonical as of D47428.

Reviewers: nhaehnle, bogner, tstellar, arsenm

Reviewed By: arsenm

Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #amdgpu

Differential Revision: https://reviews.llvm.org/D48012

llvm-svn: 334817
2018-06-15 09:56:45 +00:00
Roman Lebedev
733f1f7fbd [AMDGPU] Recognize x & ((1 << y) - 1) pattern.
Summary:
As a followup for D48007.

Since we already handle `x << (bitwidth - y) >> (bitwidth - y)` pattern,
which does not have ub for both the edge cases (`y == 0`, `y == bitwidth`),
i think also handling a pattern that is ub for `y == bitwidth` should be fine.

Reviewers: nhaehnle, bogner, tstellar, arsenm

Reviewed By: arsenm

Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #amdgpu

Differential Revision: https://reviews.llvm.org/D48010

llvm-svn: 334816
2018-06-15 09:56:39 +00:00
Roman Lebedev
ab8ea027b7 [AMDGPU] Recognize x & (-1 >> (32 - y)) pattern.
Summary:
D47980 will canonicalize the `x << (32 - y) >> (32 - y)`,
which is the pattern the AMDGPU expects to `x &  (-1 >> (32 - y))`,
which is not recognized by AMDGPU.

Thus, it needs to be recognized, too.

Reviewers: nhaehnle, bogner, tstellar, arsenm

Reviewed By: arsenm

Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #amdgpu

Differential Revision: https://reviews.llvm.org/D48007

llvm-svn: 334815
2018-06-15 09:56:31 +00:00
Craig Topper
ddac8b162b Revert r334802 "[X86] Prevent folding stack reloads with instructions that have an undefined register update."
There's a typo causing the build to fail.

llvm-svn: 334803
2018-06-15 06:15:26 +00:00
Craig Topper
21c8de8539 [X86] Prevent folding stack reloads with instructions that have an undefined register update.
We want to keep the load unfolded so we can use the same register for both sources to avoid a false dependency.

llvm-svn: 334802
2018-06-15 06:11:36 +00:00
Craig Topper
5526c28acc [X86] Add more instructions to the memory folding tables using the autogenerated table as a guide.
I think this covers most of the unmasked vector instructions. We're still missing a lot of the masked instructions.

There are some test changes here because of the new folding support. I don't think these particular cases should be folded because it creates an undef register dependency. I think the changes introduced in r334175 are not handling stack folding. They're only blocking the peephole pass.

llvm-svn: 334800
2018-06-15 05:49:19 +00:00
Craig Topper
5584c8b9f5 [X86] Add 'Z' to the internal names of various EVEX instructions for overall consistency.
llvm-svn: 334785
2018-06-15 04:42:54 +00:00
Sanjay Patel
41b45b2b8c [x86] be more selective about converting 'and' to shuffle (PR37749)
isVectorClearMaskLegal() is the TLI hook used by the generic
DAGCombiner::XformToShuffleWithZero().

We've grown to accomodate/expect this transform to shuffle
(disabling it more generally results in many regressions).
So I'm narrowly excluding the 256-bit types that clearly 
are not worthwhile for AVX1. 

I think in most cases we are able to recover by converting 
the shuffle back into 'and' ops, but the cases in:
https://bugs.llvm.org/show_bug.cgi?id=37749
...show that there are cracks.

llvm-svn: 334759
2018-06-14 19:55:02 +00:00
Craig Topper
bb91ae466c [X86] Fix stale comment in folding tables.
llvm-svn: 334758
2018-06-14 19:28:31 +00:00
Tom Stellard
5608dfed6a AMDGPU/GlobalISel: Implement select() for @llvm.amdgcn.cvt.pkrtz
Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45907

llvm-svn: 334757
2018-06-14 19:26:37 +00:00
Craig Topper
3eb8c69f72 [X86] Add more vector instructions to the memory folding table using the autogenerated table as a guide.
The test cahnge is because we now fold stack reload into RNDSCALE and RNDSCALE can be turned into ROUND by EVEX->VEX.

llvm-svn: 334728
2018-06-14 15:40:31 +00:00
Craig Topper
33666c49e2 [X86] Remove '128' from the internal name of some scalar FP instructions to be consistent with other scalar instructions.
llvm-svn: 334727
2018-06-14 15:40:30 +00:00
Craig Topper
2cd52c8559 [X86] Disable load unfolding for a bunch of instruction where unfolding would increase the size of the load.
Found by an audit of the manual table vs the autogenerated table.

llvm-svn: 334726
2018-06-14 15:40:29 +00:00
Craig Topper
944ab10211 [X86] Remove NotMemoryFoldable from some AVX/AVX512 scalar instructions.
Some of these instructions are already in the manual folding table so we should have them in the auto table too.

llvm-svn: 334725
2018-06-14 15:40:27 +00:00
Simon Dardis
ee8aa2671a [mips] Correct predicates for MSA pseudo instructions
llvm-svn: 334708
2018-06-14 13:03:53 +00:00
Craig Topper
ad3af3e8a7 [x86] fix mappings of cvttp2si/cvttp2ui x86 intrinsics to x86-specific nodes and isel patterns (PR37551)
Summary:
The tests in:
https://bugs.llvm.org/show_bug.cgi?id=37751
...show miscompiles because we wrongly mapped and folded x86-specific intrinsics into generic DAG nodes.

This patch corrects the mappings in X86IntrinsicsInfo.h and adds isel matching corresponding to the new patterns. The complete tests for the failure cases should be in avx-cvttp2si.ll and sse-cvttp2si.ll and avx512-cvttp2i.ll

Reviewers: RKSimon, gbedwell, spatel

Reviewed By: spatel

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D47993

llvm-svn: 334685
2018-06-14 03:16:58 +00:00
Tom Stellard
bec61a1866 AMDGPU/GlobalISel: Implement select() for 32-bit G_FADD and G_FMUL
Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D46171

llvm-svn: 334665
2018-06-13 22:30:47 +00:00
Craig Topper
726af1b5f7 [X86] Move RCPSSr_Int, RSQRTSSr_Int, SQRTSDr_Int, SQRTSSr_Int to the correct load folding table.
They were in the operand 1 folding table, but their foldable operand is operand 2.

llvm-svn: 334648
2018-06-13 20:03:42 +00:00
Stanislav Mekhanoshin
3138aeda82 [AMDGPU] Corrected computeKnownBits for V_PERM_B32
Differential Revision: https://reviews.llvm.org/D48133

llvm-svn: 334640
2018-06-13 18:52:54 +00:00
Yaxun Liu
166d7e51aa [AMDGPU] Change enqueue kernel handle type
Currently the handle type is a global pointer which holds 8 bytes.
We need a larger type which hold 16 bytes, therefore change it
to [i64 x 2].

Differential Revision: https://reviews.llvm.org/D48094

llvm-svn: 334625
2018-06-13 17:31:51 +00:00
Dmitry Preobrazhensky
f3bc07c9d5 [AMDGPU][MC] Enabled parsing of relocations on VALU instructions
See bug 37566: https://bugs.llvm.org/show_bug.cgi?id=37566

Reviewers: artem.tamazov, arsenm, nhaehnle

Differential Revision: https://reviews.llvm.org/D47884

llvm-svn: 334622
2018-06-13 17:02:03 +00:00
Dmitry Preobrazhensky
224cb6c72d [AMDGPU][MC][GFX8][GFX9] Allow LDS direct reads for BUFFER_LOAD_DWORDX2/X3/X4
See bug 37653: https://bugs.llvm.org/show_bug.cgi?id=37653

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D47885

llvm-svn: 334609
2018-06-13 15:32:46 +00:00
Tom Stellard
968b81b26b AMDGPU: Move isSDNodeSourceOfDivergence() implementation to SITargetLowering
Summary:
The code that handles ISD:Register and ISD::CopyFromReg assumes
the target is amdgcn, so this is broken on r600.  We don't
need this analysis on r600 anyway so we can safely move
it to SITargetLowering.

Reviewers: alex-t, arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: msearles, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D46298

llvm-svn: 334607
2018-06-13 15:06:37 +00:00
Zoran Jovanovic
58b914129e [mips][microMIPS] Extending size reduction pass with LWP and SWP
Author: milena.vujosevic.janicic
Reviewers: sdardis
The patch extends size reduction pass for MicroMIPS.
It introduces reduction of two instructions into one instruction:
Two SW instructions are transformed into one SWP instrucition.
Two LW instructions are transformed into one LWP instrucition.
Differential Revision: https://reviews.llvm.org/D39115

llvm-svn: 334595
2018-06-13 12:51:37 +00:00
Sanjay Patel
434596cb1e [x86] eliminate even more sign-bit tests with vector select
This shortcoming was noted in D47330, and the test diffs show we already 
had other examples where we failed to fold to a SHRUNKBLEND:

/// Dynamic (non-constant condition) vector blend where only the sign bits
/// of the condition elements are used. This is used to enforce that the
/// condition mask is not valid for generic VSELECT optimizations.

This patch implements an idea from D48043 and would obsolete that patch 
because it catches more cases (notable the AVX1 case that was missed there). 
All we're doing is allowing the existing transform to fire more often by 
removing the post-legalize constraint. All of the relevant feature checks 
and other predicates are left as-is.

Differential Revision: https://reviews.llvm.org/D48078

llvm-svn: 334592
2018-06-13 12:28:32 +00:00
Alex Bradbury
446ce455bc [RISCV] Add codegen support for atomic load/stores with RV32A
Fences are inserted according to table A.6 in the current draft of version 2.3
of the RISC-V Instruction Set Manual, which incorporates the memory model
changes and definitions contributed by the RISC-V Memory Consistency Model
task group.

Instruction selection failures will now occur for 8/16/32-bit atomicrmw and 
cmpxchg operations when targeting RV32IA until lowering for these operations 
is added in a follow-on patch.

Differential Revision: https://reviews.llvm.org/D47589

llvm-svn: 334591
2018-06-13 12:04:51 +00:00
Alex Bradbury
e0603b95db [RISCV] Codegen support for atomic operations on RV32I
This patch adds lowering for atomic fences and relies on AtomicExpandPass to
lower atomic loads/stores, atomic rmw, and cmpxchg to __atomic_* libcalls.

test/CodeGen/RISCV/atomic-* are modelled on the exhaustive
test/CodeGen/PPC/atomics-regression.ll, and will prove more useful once RV32A
codegen support is introduced.

Fence mappings are taken from table A.6 in the current draft of version 2.3 of
the RISC-V Instruction Set Manual, which incorporates the memory model changes
and definitions contributed by the RISC-V Memory Consistency Model task group.

Differential Revision: https://reviews.llvm.org/D47587

llvm-svn: 334590
2018-06-13 11:58:46 +00:00
Clement Courbet
e3e2fa9c0a [TableGen] Emit a fatal error on inconsistencies in resource units vs cycles.
Summary:
For targets I'm not familiar with, I've automatically made the "default to 1 for each resource" behaviour explicit in the td files.
For more obvious cases, I've ventured a fix.

Some notes:
 - Exynos is especially fishy.
 - AArch64SchedThunderX2T99.td had some truncated entries. If I understand correctly, the person who wrote that interpreted the ResourceCycle as a range. I made the decision to use the upper/lower bound for consistency with the 'Latency' value. I'm sure there is a better choice.
 - The change to X86ScheduleBtVer2.td is an NFC, it just makes values more explicit.

Also see PR37310.

Reviewers: RKSimon, craig.topper, javed.absar

Subscribers: kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D46356

llvm-svn: 334586
2018-06-13 09:41:49 +00:00
Hiroshi Inoue
c50f941cad [PowerPC] fix trivial typos in comment, NFC
llvm-svn: 334583
2018-06-13 08:54:13 +00:00
Hiroshi Inoue
2dde032c9a [PowerPC] avoid verification failure due to PowerPC VSX Swap Removal pass
This patch fixes a failure in lnt tests with -verify-machineinstrs option.
When VSX Swap Removal pass swaps two register operands, it did not maintain kill flags associated with operands. This patch swaps flags as well as register number to avoid inconsistent kill flags information.

llvm-svn: 334579
2018-06-13 08:25:14 +00:00
Craig Topper
45edfd5c6f [X86] Remove masking from avx512vbmi2 concat and shift by immediate intrinsics. Use select in IR instead.
llvm-svn: 334576
2018-06-13 07:19:21 +00:00
Craig Topper
73c6ebc8bf [X86] Mark all instructions that have masked store semantics with NotMemoryFoldable. Remove dependency on SchedRW from memory table autogenerator.
Previously we were whitelisting in instructions based on their SchedRW value. With the masked store instructions explicitly removed via NotMemoryFoldable, we don't seem to need this check anymore.

llvm-svn: 334563
2018-06-13 00:04:08 +00:00
Craig Topper
7b76729ea5 [X86] Remove VPCOMPRESSB/W from the autogenerated load folding table.
llvm-svn: 334562
2018-06-13 00:04:04 +00:00
Stanislav Mekhanoshin
017429bc2b [AMDGPU] DAG combine to produce V_PERM_B32
Differential Revision: https://reviews.llvm.org/D48099

llvm-svn: 334559
2018-06-12 23:50:37 +00:00
Krzysztof Parzyszek
90f9f3ddd4 [DAGCombiner] Recognize more patterns for ABS
Differential Revision: https://reviews.llvm.org/D47831

llvm-svn: 334553
2018-06-12 21:51:49 +00:00
Petr Hosek
4239933fa7 [AArch64] Support reserving x20 register
Register x20 is a callee-saved register which may be used for other
purposes in certain contexts, for example to hold special variables
within the kernel. This change adds support for reserving this register
both to frontend and backend to make this register usable for these
purposes.

Differential Revision: https://reviews.llvm.org/D46552

llvm-svn: 334531
2018-06-12 20:00:50 +00:00
Craig Topper
e65706af2c [X86] Remove mayLoad flag from AVX512 truncating store instructions.
llvm-svn: 334529
2018-06-12 19:59:08 +00:00
Reid Kleckner
aeb232db7c [MS][ARM64] Hoist __ImageBase handling into TargetLoweringObjectFileCOFF
All COFF targets should use @IMGREL32 relocations for symbol differences
against __ImageBase. Do the same for getSectionForConstant, so that
immediates lowered to globals get merged across TUs.

Patch by Chris January

Differential Revision: https://reviews.llvm.org/D47783

llvm-svn: 334523
2018-06-12 18:56:05 +00:00
Konstantin Zhuravlyov
fb41a59565 AMDHSA/NFC: Code object v3 updates (additional):
- Move section selection and alignment to AMDGPUAsmPrinter

llvm-svn: 334521
2018-06-12 18:33:51 +00:00
Konstantin Zhuravlyov
7de6ea264e AMDHSA: Code object v3 updates
- Do not emit following assembler directives:
  - .hsa_code_object_version
  - .hsa_code_object_isa
  - .amd_amdgpu_isa
  - .amd_amdgpu_hsa_metadata
  - .amd_amdgpu_pal_metadata
- Do not emit .note entries
- Cleanup and bring in sync kernel descriptor header file
- Emit kernel descriptor into .rodata with appropriate relocations and
  alignments

llvm-svn: 334519
2018-06-12 18:02:46 +00:00
Fangrui Song
5fc4e7a80f [MC] [X86] Teach leaq _GLOBAL_OFFSET_TABLE(%rip), %r15 to use R_X86_64_GOTPC32 instead of R_X86_64_PC32
Summary:
This is similar to D46319 (ARM). x86-64 psABI p40 gives an example:

  leaq _GLOBAL_OFFSET_TABLE(%rip), %r15 # GOTPC32 reloc

GNU as creates R_X86_64_GOTPC32. However, MC currently emits R_X86_64_PC32.

Reviewers: javed.absar, echristo

Subscribers: kristof.beyls, llvm-commits, peter.smith, grimar

Differential Revision: https://reviews.llvm.org/D47507

llvm-svn: 334515
2018-06-12 16:20:44 +00:00
Simon Pilgrim
f6cb95e1e4 [CostModel] Replace ShuffleKind::SK_Alternate with ShuffleKind::SK_Select (PR33744)
As discussed on PR33744, this patch relaxes ShuffleKind::SK_Alternate which requires shuffle masks to only match an alternating pattern from its 2 sources:

e.g. v4f32: <0,5,2,7> or <4,1,6,3>

This seems far too restrictive as most SIMD hardware which will implement it using a general blend/bit-select instruction, so replaces it with SK_Select, permitting elements from either source as long as they are inline:

e.g. v4f32: <0,5,2,7>, <4,1,6,3>, <0,1,6,7>, <4,1,2,3> etc.

This initial patch just updates the name and cost model shuffle mask analysis, later patch reviews will update SLP to better utilise this - it still limits itself to SK_Alternate style patterns.

Differential Revision: https://reviews.llvm.org/D47985

llvm-svn: 334513
2018-06-12 16:12:29 +00:00
Craig Topper
115cd845d4 [X86] Remove TB_ALIGN_16 from VEXTRACTF128/VEXTRACTI128 in the memory folding table.
llvm-svn: 334511
2018-06-12 15:48:03 +00:00
Krzysztof Parzyszek
5570d6d39d [Hexagon] Make floating point operations expensive for vectorization
llvm-svn: 334508
2018-06-12 15:12:50 +00:00
Sanjay Patel
7bd5278a54 [x86] move shrunkblend transform to helper function; NFCI
We should be able to obsolete D48043 by easing the constraints
on this existing code. 

llvm-svn: 334504
2018-06-12 14:21:51 +00:00
Krzysztof Parzyszek
cca7086924 [SelectionDAG] Provide default expansion for rotates
Implement default legalization of rotates: either in terms of the rotation
in the opposite direction (if legal), or in terms of shifts and ors.

Implement generating of rotate instructions for Hexagon. Hexagon only
supports rotates by an immediate value, so implement custom lowering of
ROTL/ROTR on Hexagon. If a rotate is not legal, use the default expansion.

Differential Revision: https://reviews.llvm.org/D47725

llvm-svn: 334497
2018-06-12 12:49:36 +00:00
Simon Dardis
3c0a89fac1 [mips] Guard some floating point instructions correctly
Reviewers: smaksimovic, atanasyan, abeserminji

Differential Revision: https://reviews.llvm.org/D47636

llvm-svn: 334491
2018-06-12 10:28:06 +00:00
Aleksandar Beserminji
8dac2acfcf [mips] Extend LONG_BRANCH_LUi/ADDiu with extra parameter
Extend LONG_BRANCH_LUi and LONG_BRANCH_ADDiu pseudo instructions with
additional flag, so instead of always lowering to lui %hi(...),
addiu %lo(...) or addiu %hi(...), now they can lower to either %lo, %hi,
%higher or %highest depending on the added flag.

Differential Revision: https://reviews.llvm.org/D47941

llvm-svn: 334490
2018-06-12 10:23:49 +00:00
Luke Geeson
577f7ff74f [AArch64] Audit on rL333879 to fix FP16 64bit bitpatterns
llvm-svn: 334488
2018-06-12 09:35:20 +00:00
Craig Topper
b33c89957c [X86] Add NotMemoryFoldable to the VPCOMPRESS instructions.
llvm-svn: 334481
2018-06-12 07:32:19 +00:00