1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-02-01 13:11:39 +01:00

2585 Commits

Author SHA1 Message Date
Jay Foad
3f23d4b8c3 [MachineScheduler] Fix the TopDepth/BotHeightReduce latency heuristics
tryLatency compares two sched candidates. For the top zone it prefers
the one with lesser depth, but only if that depth is greater than the
total latency of the instructions we've already scheduled -- otherwise
its latency would be hidden and there would be no stall.

Unfortunately it only tests the depth of one of the candidates. This can
lead to situations where the TopDepthReduce heuristic does not kick in,
but a lower priority heuristic chooses the other candidate, whose depth
*is* greater than the already scheduled latency, which causes a stall.

The fix is to apply the heuristic if the depth of *either* candidate is
greater than the already scheduled latency.

All this also applies to the BotHeightReduce heuristic in the bottom
zone.

Differential Revision: https://reviews.llvm.org/D72392
2020-07-17 11:02:13 +01:00
Kai Luo
71e1d22d77 [PowerPC] Precommit test case for PR46759. NFC. 2020-07-17 08:41:15 +00:00
Jay Foad
47db2fc583 [PowerPC] Precommit 64-bit funnel shift test cases 2020-07-16 15:20:52 +01:00
Jay Foad
5903937557 [PowerPC] Use CHECK-LABEL for better diagnostics 2020-07-16 13:41:29 +01:00
Amy Kwan
1157465de4 [PowerPC][Power10] Fix VINS* (vector insert byte/half/word) instructions to have i32 arguments.
Previously, the vins* intrinsic was incorrectly defined to have its second and
third argument arguments as an i64. This patch fixes the second and third
argument of the vins* instruction and intrinsic to have i32s instead.

Differential Revision: https://reviews.llvm.org/D83497
2020-07-16 00:30:24 -05:00
Amy Kwan
9e6b3fe312 [PowerPC][Power10] Implement Test LSB by Byte Builtins in LLVM/Clang
This patch implements builtins for the Test LSB by Byte instruction introduced
in Power10.

Differential Revision: https://reviews.llvm.org/D82431
2020-07-13 22:47:47 -05:00
Kai Luo
9f42c54bb4 [PowerPC] Generate CFI directives when probing in prologue
Add missing CFI directives when probing in prologue if
`stack-clash-protection` is enabled.

Differential Revision: https://reviews.llvm.org/D83276
2020-07-14 02:56:12 +00:00
Fangrui Song
b8f06a9e5f [PowerPC] Fix combineVectorShuffle regression after D77448
Commit 1fed131660b2 assumed that NewShuffle (shuffle vector
canonicalization result) will always be ShuffleVectorSDNode, which may
be false (it may be a BITCAST node):

```
...
t12: v4i32 = scalar_to_vector t2
t15: v16i8 = bitcast t12  # LHS
t17: v16i8 = vector_shuffle<u,u,u,u,u,u,u,u,0,1,2,3,u,u,u,u> t15, undef:v16i8  # SVN
```

Reviewed By: #powerpc, nemanjai

Differential Revision: https://reviews.llvm.org/D83617
2020-07-13 16:57:27 -07:00
Kai Luo
a7930d8f42 [PowerPC] Enhance tests for D83276. NFC. 2020-07-13 04:37:09 +00:00
Qiu Chaofan
e2a03586ce [PowerPC] Support constrained conversion in SPE target
This patch adds support for constrained int/fp conversion between
signed/unsigned i32 and f32/f64.

Reviewed By: jhibbits

Differential Revision: https://reviews.llvm.org/D82747
2020-07-13 12:18:36 +08:00
Jinsong Ji
6ef62dfa4a [PowerPC][MachinePipeliner] Enable pipeliner if hasInstrSchedModel
P9 is the only one with InstrSchedModel, but we may have more in the
future, we should not hardcoded it to P9, check hasInstrSchedModel
instead.

Reviewed By: hfinkel

Differential Revision: https://reviews.llvm.org/D83590
2020-07-11 02:24:12 +00:00
Lei Huang
ae49303684 [PowerPC] Enable default support of quad precision operations
Summary: Remove option guarding support of quad precision operations.

Reviewers: nemanjai, #powerpc, steven.zhang

Reviewed By: nemanjai, #powerpc, steven.zhang

Subscribers: qiucf, wuzish, nemanjai, hiraditya, kbarton, shchenz, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83437
2020-07-10 13:27:48 -05:00
Kang Zhang
f98ea09169 [NFC][PowerPC] Add a new MIR file to test mi-peephole pass 2020-07-10 16:08:07 +00:00
Kai Luo
8f0afb6658 [PowerPC] Only make copies of registers on stack in variadic function when va_start is called
On PPC64, for a variadic function, if va_start is not called, it won't
access any variadic argument on stack, thus we can save stores of
registers used to pass arguments.

Differential Revision: https://reviews.llvm.org/D82361
2020-07-09 07:18:17 +00:00
Biplob Mishra
20220d08fc [PowerPC] Implement Vector Replace Builtins in LLVM
Provide the LLVM intrinsics needed to implement vector replace element
builtins in altivec.h which will be added in a subsequent patch.

Differential Revision: https://reviews.llvm.org/D83308
2020-07-07 12:22:52 -05:00
Nemanja Ivanovic
2e394d9166 [PowerPC] Do not RAUW combined nodes in VECTOR_SHUFFLE legalization
When legalizing shuffles, we make an attempt to combine it into
a PPC specific canonical form that avoids a need for a swap. If the
combine is successful, we RAUW the node and the custom legalization
replaces the now dead node instead of the one it should replace.
Remove that erroneous call to RAUW.
2020-07-06 22:09:28 -05:00
Biplob Mishra
ff0e91295b [PowerPC] Implement Vector Splat Immediate Builtins in Clang
Implements builtins for the following prototypes:
  vector signed int vec_splati (const signed int);
  vector float vec_splati (const float);
  vector double vec_splatid (const float);
  vector signed int vec_splati_ins (vector signed int, const unsigned int,
                                    const signed int);
  vector unsigned int vec_splati_ins (vector unsigned int, const unsigned int,
                                      const unsigned int);
  vector float vec_splati_ins (vector float, const unsigned int, const float);

Differential Revision: https://reviews.llvm.org/D82520
2020-07-06 20:29:33 -05:00
Amy Kwan
77b45c8014 [PowerPC][Power10] Exploit the xxsplti32dx instruction when lowering VECTOR_SHUFFLE.
This patch aims to exploit the xxsplti32dx XT, IX, IMM32 instruction when lowering VECTOR_SHUFFLEs.
We implement lowerToXXSPLTI32DX when lowering vector shuffles to check if:
- Element size is 4 bytes
- The RHS is a constant vector (and constant splat of 4-bytes)
- The shuffle mask is a suitable mask for the XXSPLTI32DX instruction where it is one of the 32 masks:
<0, 4-7, 2, 4-7>
<4-7, 1, 4-7, 3>

Differential Revision: https://reviews.llvm.org/D83245
2020-07-06 20:28:38 -05:00
jasonliu
3b7308f12c [XCOFF][AIX] Give symbol an internal name when desired symbol name contains invalid character(s)
Summary:

When a desired symbol name contains invalid character that the
system assembler could not process, we need to emit .rename
directive in assembly path in order for that desired symbol name
to appear in the symbol table.

Reviewed By: hubert.reinterpretcast, DiggerLin, daltenty, Xiangling_L

Differential Revision: https://reviews.llvm.org/D82481
2020-07-06 15:49:15 +00:00
Esme-Yi
5f873faf6c [PowerPC] Legalize SREM/UREM directly on P9.
Summary: As Bugzilla-35090 reported, the rationale for using custom lowering SREM/UREM should no longer be true. At the IR level, the div-rem-pairs pass performs the transformation where the remainder is computed from the result of the division when both a required. We should now be able to lower these directly on P9. And the pass also fixed the problem that divide is in a different block than the remainder. This is a patch to remove redundant code and make SREM/UREM legal directly on P9.

Reviewed By: lkail

Differential Revision: https://reviews.llvm.org/D82145
2020-07-06 11:47:31 +00:00
Kai Luo
f91a303288 [PowerPC] Implement probing for prologue
This patch is part of supporting `-fstack-clash-protection`. Implemented
probing when emitting prologue.

Differential Revision: https://reviews.llvm.org/D81460
2020-07-04 03:07:08 +00:00
Biplob Mishra
65c4fdc701 [PowerPC] Implement Vector Insert Builtins in LLVM/Clang
Implements vec_insertl() and vec_inserth().

Differential Revision: https://reviews.llvm.org/D82365
2020-07-03 15:30:41 -05:00
jasonliu
b6227b52f3 [XCOFF][AIX] Use 'L..' instead of '.L' for getPrivateGlobalPrefix in DataLayout
Summary:
D80831 changed part of the prefix usage for AIX.
But there are other places getting prefix from DataLayout.
This patch intends to make prefix usage consistent on AIX.

Reviewed by: hubert.reinterpretcast, daltenty

Differential Revision: https://reviews.llvm.org/D81270
2020-07-03 18:25:14 +00:00
Sean Fertile
91c9c2c9c7 Enable basepointer for AIX.
Differential Revision: https://reviews.llvm.org/D82030
2020-07-03 11:55:49 -04:00
Kai Luo
8382e3ec50 [PowerPC] Implement probing for dynamic stack allocation
This patch is part of supporting `-fstack-clash-protection`. Mainly do
such things compared to existing `lowerDynamicAlloc`

- Added a new pseudo instruction PPC::PREPARE_PROBED_ALLOC to get
  actual frame pointer and final stack pointer.
- Synthesize a loop to probe by blocks.
- Use DYNAREAOFFSET to get MaxCallFrameSize which is calculated in
  prologepilog.

Differential Revision: https://reviews.llvm.org/D81358
2020-07-03 05:36:40 +00:00
Biplob Mishra
d6caac3195 [PowerPC] Implement Vector Blend Builtins in LLVM/Clang
Implements vec_blendv()

Differential Revision: https://reviews.llvm.org/D82774
2020-07-02 16:52:52 -05:00
Biplob Mishra
283a0554d5 [PowerPC]Implement Vector Permute Extended Builtin
Implements vector permute builtin: vec_permx()

Differential Revision: https://reviews.llvm.org/D82869
2020-07-02 14:53:18 -05:00
Nemanja Ivanovic
84b83b6cc1 [PowerPC] Remove undefs from splat input when changing shuffle mask
As of 1fed131660b2c5d3ea7007e273a7a5da80699445, we have code that
changes shuffle masks so that we can put the shuffle in a canonical
form that can be matched to a single instruction. However, it
does not properly account for undef elements in the BUILD_VECTOR
that is the RHS splat so we can end up with undefs where they
shouldn't be. This patch converts the splat input with undefs to
one without.
2020-07-02 12:26:56 -05:00
Qiu Chaofan
5777c725bf [NFC] Fix typo in triples from unkown to unknown 2020-07-02 16:21:54 +08:00
Biplob Mishra
30db684f8c [PowerPC]Implement Vector Shift Double Bit Immediate Builtins
Implement Vector Shift Double Bit Immediate Builtins in LLVM/Clang.
  * vec_sldb ();
  * vec_srdb ();

Differential Revision: https://reviews.llvm.org/D82440
2020-07-01 20:34:53 -05:00
Anil Mahmud
e9e3ccc44e [PowerPC] Exploit xxspltiw and xxspltidp instructions
Exploits the VSX Vector Splat Immediate Word and
VSX Vector Splat Immediate Double Precision instructions:

  xxspltiw XT,IMM32
  xxspltidp XT,IMM32

Differential Revision: https://reviews.llvm.org/D82911
2020-07-01 19:18:29 -05:00
Stefan Pintilie
91403d8b4a [PowerPC] Fix for PC Relative call protocol
The situation where the caller uses a TOC and the callee does not
but is marked as clobbers the TOC (st_other=1) was not being compiled
correctly if both functions where in the same object file.

The call site where we had `callee` was missing a nop after the call.
This is because it was assumed that since the two functions where in
the same DSO they would be sharing a TOC. This is not the case if the
callee uses PC Relative because in that case it may clobber the TOC.
This patch makes sure that we add the cnop correctly so that the
linker has a place to restore the TOC.

Reviewers: sfertile, NeHuang, saghir

Differential Revision: https://reviews.llvm.org/D81126
2020-07-01 07:08:41 -05:00
Nemanja Ivanovic
608bc8479f [PowerPC] Fix crash for shuffle canonicalization with elt 0 from RHS
Commit 1fed131660b2 assumed that shuffle vector canonicalization will
always ensure that the shuffle mask will be ordered so that element
zero comes from the LHS vector. However there is code out there for
which this is not the case. This patch simply removes that unsafe
assumption and makes the code work regardless of the source of the
first element.
2020-06-29 12:26:08 -05:00
Nemanja Ivanovic
5e8f6beb8e [PowerPC] Don't combine SCALAR_TO_VECTOR without VSX
Most of the patterns for PPCISD::SCALAR_TO_VECTOR_PERMUTED require
VSX. So don't emit them if the subtarget doesn't have VSX.
This resolves the issue reported on
https://reviews.llvm.org/rG1fed131660b2c5d3ea7007e273a7a5da80699445
2020-06-29 09:48:57 -05:00
Esme-Yi
7c2f3bcb24 [NFC][PowerPC] Add run lines to test DivRemPairsPass. 2020-06-28 16:26:05 +00:00
Chen Zheng
47d79b5f12 [MachineLICM] testcase for hoisting rematerializable instruction, nfc 2020-06-28 03:16:57 -04:00
Amy Kwan
04ae1ab1c3 [PowerPC] Add support for llvm.ppc.dcbt, llvm.ppc.dcbtst, llvm.ppc.isync intrinsics
This patch adds LLVM intrinsics for the dcbt (Data Cache Block Touch),
dcbtst (Data Cache Block Touch for Store) and isync (Instruction
Synchronize) instructions.

The intrinsic for dcbt and dcbst in this patch are named llvm.ppc.dcbt.with.hint
and llvm.ppc.dcbtst.with.hint respectively as there already exists an intrinsic
for llvm.ppc.dcbt and llvm.ppc.dcbtst. However, the original variants of the
intrinsics do not accept the TH immediate field, whereas these variants do.

Differential Revision: https://reviews.llvm.org/D79633
2020-06-26 13:02:18 -05:00
Amy Kwan
8e4bd7c3f3 [PowerPC][Power10] Implement centrifuge, vector gather every nth bit, vector evaluate Builtins in LLVM/Clang
This patch implements builtins for the following prototypes:

unsigned long long __builtin_cfuged (unsigned long long, unsigned long long);
vector unsigned long long vec_cfuge (vector unsigned long long, vector unsigned long long);
unsigned long long vec_gnb (vector unsigned __int128, const unsigned int);
vector unsigned char vec_ternarylogic (vector unsigned char, vector unsigned char, vector unsigned char, const unsigned int);
vector unsigned short vec_ternarylogic (vector unsigned short, vector unsigned short, vector unsigned short, const unsigned int);
vector unsigned int vec_ternarylogic (vector unsigned int, vector unsigned int, vector unsigned int, const unsigned int);
vector unsigned long long vec_ternarylogic (vector unsigned long long, vector unsigned long long, vector unsigned long long, const unsigned int);
vector unsigned __int128 vec_ternarylogic (vector unsigned __int128, vector unsigned __int128, vector unsigned __int128, const unsigned int);

Differential Revision: https://reviews.llvm.org/D80970
2020-06-25 21:34:41 -05:00
Shawn Landden
df770f665c [PowerPC] add popcount CodeGen test; NFC 2020-06-25 12:41:33 +04:00
Amy Kwan
25f513ca38 [PowerPC][Power10] Implement Count Leading/Trailing Zeroes Builtins under bit Mask in LLVM/Clang
This patch implements builtins for the following prototypes:

unsigned long long __builtin_cntlzdm (unsigned long long, unsigned long long)
unsigned long long __builtin_cnttzdm (unsigned long long, unsigned long long)
vector unsigned long long vec_cntlzm (vector unsigned long long, vector unsigned long long)
vector unsigned long long vec_cnttzm (vector unsigned long long, vector unsigned long long)

Differential Revision: https://reviews.llvm.org/D80941
2020-06-24 16:03:45 -05:00
Eli Friedman
9d315e1c2b Remove GlobalValue::getAlignment().
This function is deceptive at best: it doesn't return what you'd expect.
If you have an arbitrary GlobalValue and you want to determine the
alignment of that pointer, Value::getPointerAlignment() returns the
correct value.  If you want the actual declared alignment of a function
or variable, GlobalObject::getAlignment() returns that.

This patch switches all the users of GlobalValue::getAlignment to an
appropriate alternative.

Differential Revision: https://reviews.llvm.org/D80368
2020-06-23 19:13:42 -07:00
Chen Zheng
4cc65c325c [PowerPC] fold addi's imm operand to its imm form consumer's displacement
This patch adds a function to do following transformation:

%0:g8rc_and_g8rc_nox0 = ADDI8 %5:g8rc_and_g8rc_nox0, 144
STD killed %7:g8rc, 16, %0:g8rc_and_g8rc_nox0 :: (store 8 into %ir.8)

------>

STD killed %7:g8rc, 160, %5:g8rc_and_g8rc_nox0 :: (store 8 into %ir.8)

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D81723
2020-06-23 06:28:18 -04:00
Kai Luo
236d96d31a [PowerPC][NFC] Add tests for variadic functions on PPC64 2020-06-23 09:20:04 +00:00
Amy Kwan
999d7986df [PowerPC][Power10] Implement VSX PCV Generate Operations in LLVM/Clang
This patch implements builtins for the following prototypes for the VSX Permute
Control Vector Generate with Mask Instructions:

vector unsigned char vec_genpcvm (vector unsigned char, const int);
vector unsigned short vec_genpcvm (vector unsigned short, const int);
vector unsigned int vec_genpcvm (vector unsigned int, const int);
vector unsigned long long vec_genpcvm (vector unsigned long long, const int);

Differential Revision: https://reviews.llvm.org/D81774
2020-06-22 21:09:34 -05:00
Amy Kwan
048b03aa1b [PowerPC][Power10] Implement Vector Clear Left/Rightmost Bytes Builtins in LLVM/Clang
This patch implements builtins for the following prototypes:
```
vector signed char vec_clrl (vector signed char a, unsigned int n);
vector unsigned char vec_clrl (vector unsigned char a, unsigned int n);
vector signed char vec_clrr (vector signed char a, unsigned int n);
vector signed char vec_clrr (vector unsigned char a, unsigned int n);
```

Differential Revision: https://reviews.llvm.org/D81707
2020-06-20 18:29:16 -05:00
Nemanja Ivanovic
f6f0a5745d [PowerPC] Canonicalize shuffles to match more single-instruction masks on LE
We currently miss a number of opportunities to emit single-instruction
VMRG[LH][BHW] instructions for shuffles on little endian subtargets. Although
this in itself is not a huge performance opportunity since loading the permute
vector for a VPERM can always be pulled out of loops, producing such merge
instructions is useful to downstream optimizations.
Since VPERM is essentially opaque to all subsequent optimizations, we want to
avoid it as much as possible. Other permute instructions have semantics that can
be reasoned about much more easily in later optimizations.

This patch does the following:
- Canonicalize shuffles so that the first element comes from the first vector
  (since that's what most of the mask matching functions want)
- Switch the elements that come from splat vectors so that they match the
  corresponding elements from the other vector (to allow for merges)
- Adds debugging messages for when a shuffle is matched to a VPERM so that
  anyone interested in improving this further can get the info for their code

Differential revision: https://reviews.llvm.org/D77448
2020-06-18 21:54:22 -05:00
Amy Kwan
31eedc0f69 [PowerPC][Power10] Implement Parallel Bits Deposit/Extract Builtins in LLVM/Clang
This patch implements builtins for the following prototypes:

vector unsigned long long vec_pdep(vector unsigned long long, vector unsigned long long);
vector unsigned long long vec_pext(vector unsigned long long, vector unsigned long long __b);
unsigned long long __builtin_pdepd (unsigned long long, unsigned long long);
unsigned long long __builtin_pextd (unsigned long long, unsigned long long);

Revision Depends on D80758

Differential Revision: https://reviews.llvm.org/D80935
2020-06-18 16:23:56 -05:00
Kang Zhang
aa0efac1ba [PowerPC] Don't convert Loop to CTR Loop for fp128 BinaryOperator
Summary:
For PPC BinaryOperator of fp128 will become libcall, we shouldn't
convert loop to CTR loop if the loop contain libCall.

But currently, in the PPCTTIImpl::mightUseCTR() function, we only deal
with BinaryOperator for ppc_fp128, don't deal with the fp128.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D81353
2020-06-18 02:54:19 +00:00
Esme-Yi
3a5edae7bf [PowerPC] Custom lower rotl v1i128 to vector_shuffle.
Summary: A bug is reported in bugzilla-45628, where the swap_with_shift case can’t be matched to a single HW instruction xxswapd as expected.
In fact the case matches the idiom of rotate. We have MatchRotate to handle an ‘or’ of two operands and generate a rot[lr] if the case matches the idiom of rotate. While PPC doesn’t support ROTL v1i128. We can custom lower ROTL v1i128 to the vector_shuffle. The vector_shuffle will be matched to a single HW instruction during the phase of instruction selection.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D81076
2020-06-18 01:32:23 +00:00
Kang Zhang
8a2bb9ff6f [NFC]][PowerPC] Remove unused intrinsic for old CTR loop pass
Summary:

In the patch D62907 the PPC CTRLoops pass has been replaced by Generic
Hardware Loop pass, and it has imported some new intrinsic for Generic
Hardware Loop.

The old intrinsic used in PPC CTRLoops int_ppc_mtctr and
int_ppc_is_decremented_ctr_nonzero is been replaced by
int_set_loop_iterations and loop_decrement.

This patch is to remove above unused two instrinsic.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D81539
2020-06-17 07:06:46 +00:00