1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-01-31 12:41:49 +01:00

35389 Commits

Author SHA1 Message Date
Baptiste Saleil
5f9d8eb8f8 [PowerPC] Add clang options to control MMA support
This patch adds frontend and backend options to enable and disable
the PowerPC MMA operations added in ISA 3.1. Instructions using these
options will be added in subsequent patches.

Differential Revision: https://reviews.llvm.org/D81442
2020-08-24 09:35:55 -05:00
Matt Arsenault
7af9eb8150 AMDGPU/GlobalISel: Use different technique for sample v3s16 values
Avoid relying on implicit_def values, and odd sized G_INSERT/G_EXTRACT
2020-08-24 10:07:30 -04:00
Matt Arsenault
1221909666 AMDGPU/GlobalISel: Add baseline, failing unmerge tests 2020-08-24 10:07:30 -04:00
Matt Arsenault
9583788fb1 AMDGPU/GlobalISel: Start implementing computeKnownBitsForTargetInstr
Handle workitem intrinsics. There isn't really away to adequately test
this right now, since none of the known bits users are fine grained
enough to test the edge conditions. This triggers a number of
instances of the new 64-bit to 32-bit shift combine in the existing
tests.
2020-08-24 09:53:27 -04:00
Matt Arsenault
21aca8a3e2 GlobalISel: Reduce G_SHL width if source is extension
shl ([sza]ext x, y) => zext (shl x, y).

Turns expensive 64 bit shifts into 32 bit if it does not overflow the
source type:

This is a port of an AMDGPU DAG combine added in
5fa289f0d8ff85b9e14d2f814a90761378ab54ae. InstCombine does this
already, but we need to do it again here to apply it to shifts
introduced for lowered getelementptrs. This will help matching
addressing modes that use 32-bit offsets in a future patch.

TableGen annoyingly assumes only a single match data operand, so
introduce a reusable struct. However, this still requires defining a
separate GIMatchData for every combine which is still annoying.

Adds a morally equivalent function to the existing
getShiftAmountTy. Without this, we would have to do try to repeatedly
query the legalizer info and guess at what type to use for the shift.
2020-08-24 09:42:40 -04:00
Bjorn Pettersson
8f041837f3 [SelectionDAG] Fix miscompile bug in expandFunnelShift
This is a fixup of commit 0819a6416fd217 (D77152) which could
result in miscompiles. The miscompile could only happen for targets
where isOperationLegalOrCustom could return different values for
FSHL and FSHR.

The commit mentioned above added logic in expandFunnelShift to
convert between FSHL and FSHR by swapping direction of the
funnel shift. However, that transform is only legal if we know
that the shift count (modulo bitwidth) isn't zero.

Basically, since fshr(-1,0,0)==0 and fshl(-1,0,0)==-1 then doing a
rewrite such as fshr(X,Y,Z) => fshl(X,Y,0-Z) would be incorrect if
Z modulo bitwidth, could be zero.

```
$ ./alive-tv /tmp/test.ll

----------------------------------------
define i32 @src(i32 %x, i32 %y, i32 %z) {
%0:
  %t0 = fshl i32 %x, i32 %y, i32 %z
  ret i32 %t0
}
=>
define i32 @tgt(i32 %x, i32 %y, i32 %z) {
%0:
  %t0 = sub i32 32, %z
  %t1 = fshr i32 %x, i32 %y, i32 %t0
  ret i32 %t1
}
Transformation doesn't verify!
ERROR: Value mismatch

Example:
i32 %x = #x00000000 (0)
i32 %y = #x00000400 (1024)
i32 %z = #x00000000 (0)

Source:
i32 %t0 = #x00000000 (0)

Target:
i32 %t0 = #x00000020 (32)
i32 %t1 = #x00000400 (1024)
Source value: #x00000000 (0)
Target value: #x00000400 (1024)
```

It could be possible to add back the transform, given that logic
is added to check that (Z % BW) can't be zero. Since there were
no test cases proving that such a transform actually would be useful
I decided to simply remove the faulty code in this patch.

Reviewed By: foad, lebedev.ri

Differential Revision: https://reviews.llvm.org/D86430
2020-08-24 09:52:11 +02:00
Qiu Chaofan
9835dee1b9 [PowerPC] Support lowering int-to-fp on ppc_fp128
D70867 introduced support for expanding most ppc_fp128 operations. But
sitofp/uitofp is missing. This patch adds that after D81669.

Reviewed By: uweigand

Differntial Revision: https://reviews.llvm.org/D81918
2020-08-24 11:18:16 +08:00
Qiu Chaofan
04286d2214 [PowerPC] Allow constrained FP intrinsics in mightUseCTR
We may meet Invalid CTR loop crash when there's constrained ops inside.
This patch adds constrained FP intrinsics to the list so that CTR loop
verification doesn't complain about it.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D81924
2020-08-24 11:09:58 +08:00
QingShan Zhang
4ba8c0db80 [DAGCombine] Remove dead node when it is created by getNegatedExpression
We hit the compiling time reported by https://bugs.llvm.org/show_bug.cgi?id=46877
and the reason is the same as D77319. So we need to remove the dead node we created
to avoid increase the problem size of DAGCombiner.

Reviewed By: Spatel

Differential Revision: https://reviews.llvm.org/D86183
2020-08-24 02:50:58 +00:00
Qiu Chaofan
6cd03c3d8a [PowerPC] Support constrained vector fp/int conversion
This patch makes these operations legal, and add necessary codegen
patterns.

There's still some issue similar to D77033 for conversion from v1i128
type. But normal type tests synced in vector-constrained-fp-intrinsic
are passed successfully.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D83654
2020-08-24 10:10:27 +08:00
Fangrui Song
1b73f40c27 [X86][FastISel] Support materializing floating-point constants for large code model & PIC
The following program miscompiles because rL216012 added static
relocation model support but not for PIC.

```
// clang -fpic -mcmodel=large -O0 a.cc
double foo() { return 42.0; }
```

This patch adds PIC support.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D86024
2020-08-23 08:36:18 -07:00
Sanjay Patel
e023c2fe80 [AArch64] add tests for store merge of truncs; NFC 2020-08-22 14:54:40 -04:00
Matt Arsenault
4175888640 GlobalISel: Merge FewerElements for G_BUILD_VECTOR/G_CONCAT_VECTORS
This switches from using G_EXTRACT in odd cases to widen with undef
and unmerge.
2020-08-22 10:25:53 -04:00
Stanislav Mekhanoshin
f6faf01d47 [AMDGPU] Avoid sorting stalls in regbank-reassign
This is the slowest operation in the already slow pass.
Instead of sorting just put a stall list into an ordered
map.

Differential Revision: https://reviews.llvm.org/D86253
2020-08-21 11:49:41 -07:00
Qiu Chaofan
5585c57018 [PowerPC] Support constrained scalar sitofp/uitofp
This patch adds support for constrained scalar int to fp operations on
PowerPC. Besides, this also fixes the FP exception bit of FCFID*
instructions.

Reviewed By: steven.zhang, uweigand

Differential Revision: https://reviews.llvm.org/D81669
2020-08-22 02:10:29 +08:00
Kamau Bridgeman
710eb84124 [PowerPC][PCRelative] Thread Local Storage Support for Initial Exec
This patch is the initial support for the Intial Exec Thread Local
Local Storage model to produce code sequence and relocations correct
to the ABI for the model when using PC relative memory operations.

Reviewed By: stefanp

Differential Revision: https://reviews.llvm.org/D81947
2020-08-21 10:13:11 -05:00
diggerlin
c4598749ca [AIX][XCOFF] emit symbol visibility for xcoff object file.
SUMMARY:

Reviewers:  Jason liu

Differential Revision: https://reviews.llvm.org/D84265
2020-08-21 11:00:56 -04:00
Cameron McInally
5b6246666b [SVE] Lower fixed length UDIV to scalable
Pretty much just a copy of the SDIV patches (D86114 and D85982) with string replacement.

Differential Revision: https://reviews.llvm.org/D86316
2020-08-21 09:01:25 -05:00
Nemanja Ivanovic
d3c6d8fa89 [PowerPC] Pre-commit FISel with PC-Rel test
Our handling of PC-Relative addressing is currently broken with
Fast ISel in 3 ways:
- FISel emits calls without handling all the PC-Rel intricacies
- FISel materializes FP constants through the TOC
- FISel materializes GV's through the TOC

As it would be unnecessarily tedious to implement all the handling
for PC-Rel in Fast ISel, we will turn off FISel for anything that
generates references to the TOC.
2020-08-21 06:58:37 -05:00
lewis-revill
fd94b72c2e [RISCV] Fix inaccurate annotations on PseudoBRIND
PseudoBRIND had seemingly inherited incorrect annotations denoting it as
a call instruction and that it defines X1/ra. This caused excess
save/restore code to be emitted for ra.

Differential Revision: https://reviews.llvm.org/D86286
2020-08-21 11:38:42 +01:00
Mirko Brkusanin
afddfeaf4a [AMDGPU] Use ds_read/write_b96/b128 when possible for SDag
Do not break down local loads and stores so ds_read/write_b96/b128 in
ISelLowering can be selected on subtargets that support them and if align
requirements allow them.

Differential Revision: https://reviews.llvm.org/D84403
2020-08-21 12:26:31 +02:00
Mirko Brkusanin
09694f5b10 [AMDGPU][GlobalISel] Fix 96 and 128 local loads and stores
Fix local ds_read/write_b96/b128 so they can be selected if the alignment
allows. Otherwise, either pick appropriate ds_read2/write2 instructions or break
them down.

Differential Revision: https://reviews.llvm.org/D81638
2020-08-21 12:26:31 +02:00
Mirko Brkusanin
08706e7bce [AMDGPU] Reorganize GCN subtarget features for unaligned access
Features UnalignedBufferAccess and UnalignedDSAccess are now used to determine
whether hardware supports such access.
UnalignedAccessMode should be used to enable them.
hasUnalignedBufferAccessEnabled() and hasUnalignedDSAccessEnabled() can be
now used to quickly check both.

Differential Revision: https://reviews.llvm.org/D84522
2020-08-21 12:26:31 +02:00
Mirko Brkusanin
49f2d14543 [AMDGPU] Fix alignment requirements for 96bit and 128bit local loads and stores
Adjust alignment requirements for ds_read/write_b96/b128.
GFX9 and onwards allow misaligned access for reads and writes but only if
SH_MEM_CONFIG.alignment_mode allows it.
UnalignedDSAccess is set on GCN subtargets from GFX9 onward to let us know if we
can relax alignment requirements.
UnalignedAccessMode acts similary to UnalignedBufferAccess for DS instructions
but only from GFX9 onward and is supposed to match alignment_mode. By default
alignment of 4 is required.

Differential Revision: https://reviews.llvm.org/D82788
2020-08-21 12:26:31 +02:00
Jay Foad
6d725be5b3 [SelectionDAG] Better legalization for FSHL and FSHR
In SelectionDAGBuilder always translate the fshl and fshr intrinsics to
FSHL and FSHR (or ROTL and ROTR) instead of lowering them to shifts and
ORs. Improve the legalization of FSHL and FSHR to avoid code quality
regressions.

Differential Revision: https://reviews.llvm.org/D77152
2020-08-21 10:32:49 +01:00
Michael Liao
1cf2d56956 [amdgpu] Add codegen support for HIP dynamic shared memory.
Summary:
- HIP uses an unsized extern array `extern __shared__ T s[]` to declare
  the dynamic shared memory, which size is not known at the
  compile time.

Reviewers: arsenm, yaxunl, kpyzhov, b-sumner

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82496
2020-08-20 21:29:18 -04:00
Matt Arsenault
587fbc0a85 CodeGen: Don't drop AA metadata when splitting MachineMemOperands
Assuming this is used to split a memory access into smaller pieces,
the new access should still have the same aliasing properties as the
original memory access. As far as I can tell, this wasn't
intentionally dropped. It may be necessary to drop this if you are
moving the operand outside of the bounds of the original object in
such a way that it may alias another IR object, but I don't think any
of the existing users are doing this. Some of the uses widen into
unused alignment padding, which I think is OK.
2020-08-20 16:17:30 -04:00
Matt Arsenault
bd23f78f2f AMDGPU/GlobalISel: Legalize odd sized loads with widening
Custom lower and widen odd sized loads up to the alignment. The
default set of legalization actions doesn't have a way to represent
this. This fixes naturally aligned <3 x s8> and <3 x s16> loads.

This also starts moving towards eliminating the buggy and
overcomplicated legalization rules for narrowing. All the memory size
changes should be done in the lower or custom action, not NarrowScalar
/ FewerElements. These currently have redundant and ambiguous code
with the lower action.
2020-08-20 16:15:53 -04:00
Kamau Bridgeman
7be92ab238 [PowerPC][PCRelative] Thread Local Storage Support for General Dynamic
This patch is the initial support for the General Dynamic Thread Local
Local Storage model to produce code sequence and relocations correct
to the ABI for the model when using PC relative memory operations.

Patch by: NeHuang

Reviewed By: stefanp

Differential Revision: https://reviews.llvm.org/D82315
2020-08-20 15:08:13 -05:00
Cameron McInally
640a9a840f [NFCI][SVE] Move fixed length i32/i64 SDIV tests
Move fixed length SDIV tests from sve-fixed-length-int-arith.ll to sve-fixed-length-int-div.ll. The former uses CHECK lines that verify legalization decisions. That's overkill for the i8/i16 SDIV tests, since they have a tricky legalization.
2020-08-20 14:46:26 -05:00
Cameron McInally
06340b3cd4 [SVE] Lower fixed length vXi8/vXi16 SDIV to scalable
There are no nxv16i8/nxv8i16 SDIV instructions, so these fixed width operations must be promoted to nxv4i32.

Differential Revision: https://reviews.llvm.org/D86114
2020-08-20 13:47:01 -05:00
David Green
9435e1e36e [ARM] Regenerate mve-vabd.ll test. NFC 2020-08-20 12:24:27 +01:00
Paul Walker
fdacd25874 [SVE] Add ISEL patterns for predicated shifts by an immediate.
For scalable vector shifts the prediacte is typically all active,
which gets selected to an unpredicated shift by immediate.  When
code generating for fixed length vectors the predicate is based
on the vector length and so additional patterns are required to
make use of SVE's predicated shift by immediate instructions.

Differential Revision: https://reviews.llvm.org/D86204
2020-08-20 11:47:20 +01:00
Konstantin Schwarz
cfcda8d055 [GlobalISel][IRTranslator] Support PHI instructions in landingpad blocks
The check for the landingpad instructions was overly restrictive. In optimimized builds PHI nodes can appear
before the landingpad instructions, resulting in a fallback to SelectionDAG.

This change relaxes the check to allow PHI nodes.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D86141
2020-08-20 10:49:31 +02:00
Yvan Roux
014e02db94 [ARM][MachineOutliner] Add default mode.
Use the stack to save and restore the link register when there is no
available register to do it.

Differential Revision: https://reviews.llvm.org/D76069
2020-08-20 09:25:33 +02:00
Qiu Chaofan
cf3153bbd5 [PowerPC] Support constrained scalar fptosi/fptoui
This patch adds support for constrained scalar fp to int operations on
PowerPC. Besides, this fixes the FP exception bit of quad-precision
convert & truncate instructions.

Reviewed By: steven.zhang, uweigand

Differential Revision: https://reviews.llvm.org/D81537
2020-08-20 13:29:43 +08:00
Matt Arsenault
734b071bb5 GlobalISel: Implement fewerElementsVector for G_CONCAT_VECTORS sources
This fixes <6 x s16> = G_CONCAT_VECTORS from <3 x s16> handling.
2020-08-19 18:53:24 -04:00
Raul Tambre
90fec87fdc [AArch64][GlobalISel] Handle rtcGPR64RegClassID in AArch64RegisterBankInfo::getRegBankFromRegClass()
TargetRegisterInfo::getMinimalPhysRegClass() returns rtcGPR64RegClassID for X16
and X17, as it's the last matching class. This in turn gets passed to
AArch64RegisterBankInfo::getRegBankFromRegClass(), which hits an unreachable.

It seems sensible to handle this case, so copies from X16 and X17 work.
Copying from X17 is used in inline assembly in libunwind for pointer
authentication.

Differential Revision: https://reviews.llvm.org/D85720
2020-08-19 12:52:30 -07:00
Jessica Paquette
153c17604a [GlobalISel] Add combine for (x & mask) -> x when (x & mask) == x
If we have a mask, and a value x, where (x & mask) == x, we can drop the AND
and just use x.

This is about a 0.4% geomean code size improvement on CTMark at -O3 for AArch64.

In AArch64, this is most useful post-legalization. Patterns like this often
show up when legalizing s1s, which must be extended to larger types.

e.g.

```
%cmp:_(s32) = G_ICMP ...
%and:_(s32) = G_AND %cmp, 1
```

Since G_ICMP only produces a single bit, there's no reason to mask it with the
G_AND.

Differential Revision: https://reviews.llvm.org/D85463
2020-08-19 10:20:57 -07:00
Matt Arsenault
92c99a3fbc AMDGPU/GlobalISel: Add some bitcast tests 2020-08-19 10:38:39 -04:00
Matt Arsenault
a85deae728 AMDGPU/GlobalISel: Add selection tests for pointer constants 2020-08-19 10:23:56 -04:00
Simon Pilgrim
55538e289f [X86][AVX] getAVX512TruncNode - don't truncate from illegal vector widths.
Thanks to @fhahn for the test case.
2020-08-19 13:00:26 +01:00
David Green
e32403463f [ARM] Change target triple to arm-none-none-eabi. NFC 2020-08-19 11:58:50 +01:00
Simon Pilgrim
fe8e9d75c1 [X86][AVX] computeKnownBitsForTargetNode - add VTRUNC/VTRUNCS/VTRUNCUS known zero upper elements handling.
Like many of the AVX512 conversion ops, the VTRUNC ops guarantee the upper destination elements are zero.
2020-08-19 11:39:27 +01:00
Paul Walker
e3dc616e5a [SVE] Add tests for fixed length vector integer operations with immediate operands. 2020-08-19 11:12:03 +01:00
Simon Pilgrim
b05b7fd391 [X86][AVX] Fold store(extract_element(vtrunc)) to truncated store
Add handling for storing the extracted lower (truncated bits) element from a X86ISD::VTRUNC node - this can be lowered to a generic truncated store directly.

Differential Revision: https://reviews.llvm.org/D86158
2020-08-19 11:10:20 +01:00
Ronak Chauhan
4697f34ed6 Revert "[AMDGPU] Support disassembly for AMDGPU kernel descriptors"
This reverts commit cacfb02d28a3cabd4e45d2535cb0686cef48a2c9.

Reverting due to buildbot failures.
2020-08-19 13:12:29 +05:30
David Sherwood
f7a1832d69 [SVE][CodeGen] Fix scalable vector issues in DAGTypeLegalizer::GenWidenVectorLoads
In DAGTypeLegalizer::GenWidenVectorLoads the algorithm assumes it only
ever deals with fixed width types, hence the offsets for each individual
store never take 'vscale' into account. I've changed the code in that
function to use TypeSize instead of unsigned for tracking the remaining
load amount. In addition, I've changed the load loop to use the new
IncrementPointer helper function for updating the addresses in each
iteration, since this handles scalable vector types.

Also, I've added report_fatal_errors in GenWidenVectorExtLoads,
TargetLowering::scalarizeVectorLoad and TargetLowering::scalarizeVectorStores,
since these functions currently use a sequence of element-by-element
scalar loads/stores. In a similar vein, I've also added a fatal error
report in FindMemType for the case when we decide to return the element
type for a scalable vector type.

I've added new tests in

  CodeGen/AArch64/sve-split-load.ll
  CodeGen/AArch64/sve-ld-addressing-mode-reg-imm.ll

for the changes in GenWidenVectorLoads.

Differential Revision: https://reviews.llvm.org/D85909
2020-08-19 07:54:32 +01:00
Ronak Chauhan
142f4dd209 [AMDGPU] Support disassembly for AMDGPU kernel descriptors
Decode AMDGPU Kernel descriptors as assembler directives.

Reviewed By: scott.linder

Differential Revision: https://reviews.llvm.org/D80713
2020-08-19 08:49:07 +05:30
Changpeng Fang
c3904f6ffc AMDGPU: Implement waterfall loop for MIMG instructions with 256-bit SRsrc
Summary:
  When the resource descriptor is of vgpr, we need a waterfall loop
to read into a sgpr. In this patchm we generalized the  implementation
to work for any regster class sizes, and extend the work to MIMG
instructions.

Fixes: SWDEV-223405

Reviewers:
  arsenm, nhaehnle

Differential Revision:
  https://reviews.llvm.org/D82603
2020-08-18 16:27:36 -07:00