1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-21 18:22:53 +01:00
Commit Graph

63656 Commits

Author SHA1 Message Date
sguo35
5521155be5 Fix register clobbering on aarch64 GHC when mixing tail/non-tail calls
By default LLVM doesn't save any regs for GHC on arm64.
This means we'll clobber LR on arm64 if we make non-tail calls (e.g. L2 syscall)
So we should save LR on non-tail calls, and not assume we won't make
non-tail calls.
2022-05-28 23:38:31 +03:00
Malcolm Jestadt
c725f494c9 X86: Avoid converting EVEX to VEX when disp8 would be beneficial
Saves around 2% code size
2022-05-27 15:29:35 +03:00
sguo35
eb7a5e5301 Fix tail call guarantee setting for GHC on arm64 backend 2022-05-05 07:35:52 +03:00
Nekotekina
318b8fe374 X86: fixup matchPMADDWD_3 2021-11-18 12:01:17 +03:00
Nekotekina
1cc7bdd501 X86: improve (V)PMADDWD detection (2)
Implement "full" pattern.
2021-11-16 13:50:49 +03:00
Nekotekina
610c27aa1c X86: disable AVX512 truncate with saturation instructions
These are not very useful in RPCS3.
However, pessimizations occur.
2021-11-16 13:45:26 +03:00
Nekotekina
c9fceef173 X86: fixup (V)PMADDWD detection
Fix some bugs (missing checks).
Add constant support.
2021-11-02 17:42:39 +03:00
Nekotekina
f7d625e31a X86: improve (V)PMADDWD detection
In function combineMulToPMADDWD, if 17 bit are sign bits,
not just zero bits, the optimization can be applied sometimes.
For now, detect and replace SRA pairs with SRL.
2021-11-02 17:42:39 +03:00
Nekotekina
c36b21c023 X86: modify PreserveAll CC to save full AVX-512 state 2021-11-02 17:42:39 +03:00
Nekotekina
548daf04b5 X86: avoid vector-scalar shifts if splat amount is directly a vector ADD/SUB/AND op.
Prefer vector-vector shifts if available (AVX2+).
Improves code generated for rotate and funnel shifts.
Otherwise it would generate a shuffle + slower vector-scalar shift.
2021-11-02 17:42:39 +03:00
Nekotekina
d5bc359dfd X86: add patterns for X86ISD::VSHLV and X86ISD::VSRLV
Replace VSELECT instruction which zeroes their result on exceeding legal SHL/SRL shift amount.
2021-11-02 17:42:39 +03:00
Nekotekina
bed700114e X86: add pattern for X86ISD::VSRAV
Detect clamping ashr shift amount to max legal value
2021-11-02 17:42:39 +03:00
Nekotekina
2ffa82223f X86: expand detectAVGPattern()
Allow all integer widths in the pattern, allow ashr
Handle signed and mixed cases, allowing to replace truncation
2021-11-02 17:42:39 +03:00
Nekotekina
5ff8f4151c X86: optimize VSELECT for v16i8 with shl + sign bit test 2021-11-02 17:42:39 +03:00
Nekotekina
4743d020ce X86: LowerShift: new algorithm for vector-vector shifts
Emit pair of shifts of double size if possible
2021-11-02 17:42:39 +03:00
Nekotekina
d18817ded9 X86: Fix/workaround Small Code Model for JIT
Force RIP-relative jump tables and global values
These things were causing crashes due to use of absolute addressing
2021-11-02 17:42:39 +03:00
guopeilin
39c406a58f [AArch64][GlobalISel] Use ZExtValue for zext(xor) when invert tb(n)z
Currently, we use SExtValue to decide whether to invert tbz or tbnz.
However, for the case zext (xor x, c), we should use ZExt rather
than SExt otherwise we will generate totally opposite branches.

Reviewed By: paquette

Differential Revision: https://reviews.llvm.org/D108755

(cherry picked from commit 5f48c144c58f6d23e850a1978a6fe05887103b17)
2021-09-21 09:15:10 -07:00
Simon Pilgrim
aed4e7449f [X86] combineX86ShuffleChain - ensure we only peek through bitcasts to vectors (PR51858)
When searching for hidden identity shuffles (added at rG41146bfe82aecc79961c3de898cda02998172e4b), only peek through bitcasts to the source operand if it is a vector type as well.

(cherry picked from commit dcba99418438ec1d624ad207674234bd2e9e3394)
2021-09-20 11:22:27 -07:00
Tom Stellard
7b93a88a1e Revert "[AArch64][GlobalISel] Legalize bswap <2 x i16>"
This reverts commit 5cd63e9ec2a385de2682949c0bbe928afaf35c91.

https://bugs.llvm.org/show_bug.cgi?id=51707
2021-09-10 21:09:59 -07:00
Elliot Saba
a967752a95 [X86] Don't clobber EBX in stackprobes
On X86, the stackprobe emission code chooses the `R11D` register, which
is illegal on i686.  This ends up wrapping around to `EBX`, which does
not get properly callee-saved within the stack probing prologue,
clobbering the register for the callers.

We fix this by explicitly using `EAX` as the stack probe register.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D109203

(cherry picked from commit ae8507b0df738205a6b9e3795ad34672b7499381)
2021-09-10 09:30:52 -07:00
Bradley Smith
28d769100d Workaround incorrect types when lowering fixed length gather/scatter
When lowering a fixed length gather/scatter the index type is assumed to
be the same as the memory type, this is incorrect in cases where the
extension of the index has been folded into the addressing mode.

For now add a temporary workaround to fix the codegen faults caused by
this by preventing the removal of this extension. At a later date the
lowering for SVE gather/scatters will be redesigned to improve the way
addressing modes are handled.

As a short term side effect of this change, the addressing modes
generated for fixed length gather/scatters will not be optimal.

Differential Revision: https://reviews.llvm.org/D109145

(cherry picked from commit 14e1a4a6eef2fb95ec852c9ddfc597f80bba3226)
2021-09-09 09:05:58 -07:00
Cullen Rhodes
5f6ef6fbfd [AArch64][SME] Fix imm bug in mov vector to tile aliases
Also fixes a warning mentioned in D109359.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D109363

(cherry picked from commit 89786c2b992c3cb4c4a230542d2af34ec2915a08)
2021-09-08 20:47:08 -07:00
David Truby
e0d7c39869 [AArch64][sve] Prevent incorrect function call on fixed width vector
The isEssentiallyExtractHighSubvector function currently calls
getVectorNumElements on a type that in specific cases might be scalable.
Since this function only has correct behaviour at the moment on scalable
types anyway, the function can just return false when given a fixed type.

Differential Revision: https://reviews.llvm.org/D109163

(cherry picked from commit b297531ece896fb9ec36f001a74aef144082602b)
2021-09-08 06:09:19 -07:00
Fraser Cormack
ba85498148 [RISCV] Fix reporting of incorrect commutable operand indices
This patch fixes an issue where RISCV's `findCommutedOpIndices` would
incorrectly return the pseudo `CommuteAnyOperandIndex` as a commutable
operand index, rather than fixing a specific index.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D108206

(cherry picked from commit 5b06cbac11e53ce55f483c1852a108012507a6bb)
2021-09-03 15:48:26 -07:00
Nikita Popov
4ddceef928 [WebAssembly] Fix FastISel of condition in different block (PR51651)
If the icmp is in a different block, then the register for the icmp
operand may not be initialized, as it nominally does not have
cross-block uses. Add a check that the icmp is in the same block
as the branch, which should be the common case.

This matches what X86 FastISel does:
5b6b090cf2/llvm/lib/Target/X86/X86FastISel.cpp (L1648)

The "not" transform that could have a similar issue is dropped
entirely, because it is currently dead: The incoming value is
a branch or select condition of type i1, but this code requires
an i32 to trigger.

Fixes https://bugs.llvm.org/show_bug.cgi?id=51651.

Differential Revision: https://reviews.llvm.org/D108840

(cherry picked from commit 16086d47c0d0cd08ffae8e69a69c88653e654d01)
2021-08-31 20:58:25 -07:00
Ricky Taylor
8fbe4ddc7c [M68k] Update pointer data layout
Fixes PR51626.

The M68k requires that all instruction, word and long word reads are
aligned to word boundaries. From the 68020 onwards, there is a
performance benefit from aligning long words to long word boundaries.

The M68k uses the same data layout for pointers and integers.

In line with this, this commit updates the pointer data layout to
match the layout already set for 32-bit integers: 32:16:32.

Differential Revision: https://reviews.llvm.org/D108792

(cherry picked from commit 8d3f112f0cdbed2311aead86bcd72e763ad55255)
2021-08-31 20:56:41 -07:00
Ricky Taylor
de85b171b7 [M68k][NFC] Rename M68kOperand::Kind to KindTy
Rename the M68kOperand::Type enumeration to KindTy to avoid ambiguity
with the Kind field when referencing enumeration values e.g.
`Kind::Value`.

This works around a compilation error under GCC 5, where GCC won't
lookup enum class values if you have a similarly named field
(see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60994).

The error in question is:
`M68kAsmParser.cpp:857:8: error: 'Kind' is not a class, namespace, or enumeration`

Differential Revision: https://reviews.llvm.org/D108723

(cherry picked from commit f659b6b1fa43ffb8c95dbbf767ef57f6e964e7f6)
2021-08-30 21:40:39 -07:00
Tom Stellard
5aea8f0472 Revert "[RISCV] Fix reporting of incorrect commutable operand indices"
This reverts commit a7933290f72a08dc060d38fa52772a9cc33ed9ba.

This commit caused some bot failures:

clang-with-thin-lto-ubuntu-release
lld-x86_64-win-release
llvm-clang-x86_64-expensive-checks-debian-release
2021-08-24 21:59:54 -07:00
Fraser Cormack
2424302e99 [RISCV] Fix reporting of incorrect commutable operand indices
This patch fixes an issue where RISCV's `findCommutedOpIndices` would
incorrectly return the pseudo `CommuteAnyOperandIndex` as a commutable
operand index, rather than fixing a specific index.

Reviewed By: rogfer01

Differential Revision: https://reviews.llvm.org/D108206

(cherry picked from commit 5b06cbac11e53ce55f483c1852a108012507a6bb)
2021-08-24 10:20:28 -07:00
Nikita Popov
e4e6f3eeff [AArch64] Fix comparison peephole opt with non-0/1 immediate (PR51476)
This is a non-intrusive fix for
https://bugs.llvm.org/show_bug.cgi?id=51476 intended for backport
to the 13.x release branch. It expands on the current hack by
distinguishing between CmpValue of 0, 1 and 2, where 0 and 1 have
the obvious meaning and 2 means "anything else". The new optimization
from D98564 should only be performed for CmpValue of 0 or 1.

For main, I think we should switch the analyzeCompare() and
optimizeCompare() APIs to use int64_t instead of int, which is in
line with MachineOperand's notion of an immediate, and avoids this
problem altogether.

Differential Revision: https://reviews.llvm.org/D108076

(cherry picked from commit 81b106584f2baf33e09be2362c35c1bf2f6bfe94)
2021-08-18 20:07:23 -07:00
Simon Pilgrim
45d26b8826 [X86][AVX] Extract SUBV_BROADCAST constant bits from just the lower subvector range (PR51281)
As reported on PR51281, an internal fuzz test encountered an issue when extracting constant bits from a SUBV_BROADCAST node from a constant pool source larger than the broadcasted subvector width.

The getTargetConstantBitsFromNode was assuming that the Constant would the same size as the subvector, resulting in the incorrect packing of the per-element bits data.

This patch attempts to solve this by using the SUBV_BROADCAST node to determine the subvector width, and then ensuring we extract only the lowest bits from Constant of that subvector bitsize.

Differential Revision: https://reviews.llvm.org/D107158

(cherry picked from commit 18e6a03b1a15b2661259af15ae604b4c4850cd61)
2021-08-18 12:15:46 -07:00
Tomas Matheson
4d78ad44fb [ARM][atomicrmw] Fix CMP_SWAP_32 expand assert
This assert is intended to ensure that the high registers are not
selected when it is passed to one of the thumb UXT instructions. However
it was triggering even for 32 bit where no UXT instruction is emitted.

Fixes PR51313.

Differential Revision: https://reviews.llvm.org/D107363

(cherry picked from commit 40650f27b5df95b2f96d25ea03976d8136804441)
2021-08-18 12:14:24 -07:00
Amy Kwan
8202608068 [PowerPC] Disable CTR Loop generate for fma with the PPC double double type.
It is possible to generate the llvm.fmuladd.ppcf128 intrinsic, and there is no actual
FMA instruction that corresponds to this intrinsic call for ppcf128. Thus, this
intrinsic needs to remain as a call as it cannot be lowered to any instruction, which
also means we need to disable CTR loop generation for fma involving the ppcf128 type.
This patch accomplishes this behaviour.

Differential Revision: https://reviews.llvm.org/D107914

(cherry picked from commit 581a80304c671b6cb2b1b1f87feb9fbe14875f2a)
2021-08-17 20:22:13 -07:00
Andrea Di Biagio
681b643c07 [X86][SchedModel] Add missing ReadAdvance for some arithmetic ops (PR51318 and PR51322).
This fixes a bug where implicit uses of EFLAGS were not marked as ReadAdvance in
the RM/MR variants of ADC/SBB (PR51318)

This also fixes the absence of ReadAdvance for the register operand of
RMW arithmetic instructions (PR51322).

Differential Revision: https://reviews.llvm.org/D107367

(cherry picked from commit 7a1a35a1d1ae2e69769505c9f39910067c53d53b)
2021-08-11 21:40:03 -07:00
Evandro Menezes
0c8a79e78d [RISCV] Add scheduling resources for V
Add the scheduling resources for the V extension instructions.

Differential Revision: https://reviews.llvm.org/D98002

(cherry picked from commit 63a5ac4e0d969f41bf71785cc3979349a45a2892)
2021-08-10 23:11:38 -07:00
Bradley Smith
ee15bdbb06 [AArch64][SVE] Fix assertion failure when lowering fixed length gather/scatter
The patterns for fixed length gather/scatter with 32-bit offsets and
64-bit memory type are slightly different that the rest of the patterns,
as such the lowering needs to be slightly different to ensure the
correct types are used.

Differential Revision: https://reviews.llvm.org/D107576

(cherry picked from commit 73ecb9987b00db274b7b2ac34b0602ffdb906a4b)
2021-08-10 15:34:36 -07:00
Yonghong Song
f6a86e448a BPF: avoid NE/EQ loop exit condition
Kuniyuki Iwashima reported in [1] that llvm compiler may
convert a loop exit condition with "i < bound" to "i != bound", where
"i" is the loop index variable and "bound" is the upper bound.
In case that "bound" is not a constant, verifier will always have "i != bound"
true, which will cause verifier failure since to verifier this is
an infinite loop.

The fix is to avoid transforming "i < bound" to "i != bound".
In llvm, the transformation is done by IndVarSimplify pass.
The compiler checks loop condition cost (i = i + 1) and if the
cost is lower, it may transform "i < bound" to "i != bound".
This patch implemented getArithmeticInstrCost() in BPF TargetTransformInfo
class to return a higher cost for such an operation, which
will prevent the transformation for the test case
added in this patch.

 [1] https://lore.kernel.org/netdev/1994df05-8f01-371f-3c3b-d33d7836878c@fb.com/

Differential Revision: https://reviews.llvm.org/D107483

(cherry picked from commit e52946b9ababcbf8e6f40b1b15900ae2e795a1c6)
2021-08-06 12:45:53 -07:00
Craig Topper
7c9c296915 [RISCV] Restrict performANY_EXTENDCombine to prevent an infinite loop.
The sign_extend we insert here can get turned into a zero_extend if
the sign bit is known zero. This can enable a setcc combine that
shrinks compares with zero_extend. This reduces the use count of
the zero_extend allowing other combines to turn it back into an
any_extend.

This restricts the combine to only cases where the result is used
by a CopyToReg. This works for my original motivating case. I
hope the CopyToReg use will prevent any converted extends from
turning back into an any_extend.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D106754

(cherry picked from commit 54588bcc052e5b08f90e672c33d0c1ad4eda2424)
2021-08-02 11:31:08 -07:00
Alexandros Lamprineas
276fcebbe0 [AArch64] Legalize MVT::i64x8 in DAG isel lowering
This patch legalizes the Machine Value Type introduced in D94096 for loads
and stores. A new target hook named getAsmOperandValueType() is added which
maps i512 to MVT::i64x8. GlobalISel falls back to DAG for legalization.

Differential Revision: https://reviews.llvm.org/D94097
2021-08-02 15:45:58 +01:00
Bradley Smith
183b0c7c98 [AArch64][SVE] Fix incorrect mask type when lowering fixed type SVE gather/scatter
An incorrect mask type when lowering an SVE gather/scatter was causing
a codegen fault which manifested as the incorrect predicate size being
used for an SVE gather/scatter, (e.g.. p0.b rather than p0.d).

Fixes PR51182.

Differential Revision: https://reviews.llvm.org/D106943

(cherry picked from commit 191831e380f317cd2baa5d48abe02d1d11cd44cb)
2021-07-29 07:03:40 -07:00
Xiang1 Zhang
a6d5003afd [X86] Fix lowering to illegal type in LowerINSERT_VECTOR_ELT
Differential Revision: https://reviews.llvm.org/D106780
2021-07-28 08:16:59 +08:00
Xiang1 Zhang
5d447ad589 Revert "[X86] Fix lowering to illegal type in LowerINSERT_VECTOR_ELT"
This reverts commit 6ff73efea94621e74642e4d7a15cc86a5fb6d411.
2021-07-28 08:12:29 +08:00
Xiang1 Zhang
409f0eedd6 [X86] Fix lowering to illegal type in LowerINSERT_VECTOR_ELT 2021-07-28 08:08:30 +08:00
Krzysztof Parzyszek
60850cdc6a [Hexagon] Fix resetting dead registers in DBG_VALUE_LISTs
This fixes https://llvm.org/PR51229.
2021-07-27 18:36:28 -05:00
Nemanja Ivanovic
8b3f85a32c [PowerPC] Turn deprecated altivec prefetch instrs to nops on AIX
The dst/dstt/dstst/dststt instructions are nop's on all PowerPC
cores that AIX supports. The AIX assembler also does not accept
these mnemonics. Turn them into nop's on AIX (similar to dstall).
2021-07-27 15:50:02 -05:00
Sanjay Patel
11aa71a71d [x86] update stale code comment; NFC
The transform was generalized with:
1ce05ad619a5
2021-07-27 16:45:52 -04:00
Matt Arsenault
ece3299a71 AMDGPU/GlobalISel: Fix selecting G_SEXTLOAD/G_ZEXTLOAD pre-gfx9
The patterns for the m0 glue patterns were failing to import.
2021-07-27 15:56:42 -04:00
Amara Emerson
6ce8f2f7c1 [AArch64][GlobalISel] Fix constraining LDXPX intrinsic selection.
Causes a fallback because of lack of regclasses on vregs, unless its without
asserts, where we end up crashing later in codegen.
2021-07-27 12:13:56 -07:00
Craig Topper
6bfc6b8665 [RISCV] Select vector shl by 1 to a vector add.
A vector add may be faster than a vector shift.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D106689
2021-07-27 10:57:28 -07:00
Matt Arsenault
8979bda8e8 AMDGPU: Treat IMPLICIT_DEF like a constant lanemask source
This is partially a workaround. SILowerI1Copies does not understand
unstructured loops. This would result in inserting instructions to
merge a mask register in the same block where it was defined in an
unstructured loop.
2021-07-27 11:44:38 -04:00