1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-25 20:23:11 +01:00
Commit Graph

218400 Commits

Author SHA1 Message Date
Matt Arsenault
57c0d2e25b GlobalISel: Use extension instead of merge with undef in common case
This fixes not respecting signext/zeroext in these cases. In the
anyext case, this avoids a larger merge with undef and should be a
better canonical form.

This should also handle this if a merge is needed, but I'm not aware
of a case where that can happen. In a future change this will also
allow AMDGPU to drop some custom code without introducing regressions.
2021-07-13 11:04:47 -04:00
Matt Arsenault
b1584af557 GlobalISel: Remove getIntrinsicID utility function
This is redundant with a method directly on MachineInstr
2021-07-13 11:04:10 -04:00
Matt Arsenault
2398c72d1f Mips/GlobalISel: Use more standard call lowering infrastructure
This also fixes some missing implicit uses on call instructions, adds
missing G_ASSERT_SEXT/ZEXT annotations, and some missing outgoing
sext/zexts. This also fixes not respecting tablegen requested type
promotions.

This starts treating f64 passed in i32 GPRs as a type of custom
assignment, which restores some previously XFAILed tests. This is due
to getNumRegistersForCallingConv returns a static value, but in this
case it is context dependent on other arguments.

Most of the ugliness is reproducing a hack CC_MipsO32 uses in
SelectionDAG. CC_MipsO32 depends on a bunch of vectors populated from
the original IR argument types in MipsCCState. The way this ends up
working in GlobalISel is it only ends up inspecting the most recently
added vector element. I'm pretty sure there are cleaner ways to do
this, but this seemed easier than fixing up the current DAG
handling. This is another case where it would be easier of the
CCAssignFns were passed the original type instead of only the
pre-legalized ones.

There's still a lot of junk here that shouldn't be necessary. This
also likely breaks big endian handling, but it wasn't complete/tested
anyway since the IRTranslator gives up on big endian targets.
2021-07-13 11:04:10 -04:00
Matt Arsenault
b34fbda5c6 Mips: Mark special case calling convention handling as custom
The number of registers used for passing f64 in some cases is context
dependent, and thus getNumRegistersForCallingConv is sometimes
inaccurate. For f64, it reports 1 but is sometimes split into 2 32-bit
registers.

For GlobalISel, the generic argument assignment code expects
getNumRegistersForCallingConv to return an accurate answer. Switch to
marking these arguments as custom so we can deal with this case as a
custom assignment rather.

This temporarily breaks a few globalisel tests which are fixed by a
future change to use more of the generic infrastructure.
2021-07-13 11:04:10 -04:00
Simon Pilgrim
552824ecdc [InstCombine] Fold lshr/ashr(or(neg(x),x),bw-1) --> zext/sext(icmp_ne(x,0)) (PR50816)
Handle the missing fold reported in PR50816, which is a variant of the existing ashr(sub_nsw(X,Y),bw-1) --> sext(icmp_sgt(X,Y)) fold.

We also handle the lshr(or(neg(x),x),bw-1) --> zext(icmp_ne(x,0)) equivalent - https://alive2.llvm.org/ce/z/SnZmSj

We still allow multi uses of the neg(x) - as this is likely to let us further simplify other uses of the neg - but not multi uses of the or() which would increase instruction count.

Differential Revision: https://reviews.llvm.org/D105764
2021-07-13 14:44:54 +01:00
Simon Pilgrim
ff9425b575 [InstCombine] Pre-commit ashr(or(neg(x),x),bw-1) --> sext(icmp_ne(x,0)) tests from D105764
Added 'thwart complexity-based canonicalization' hacks and the lshr(or(neg(x),x),bw-1) --> zext(icmp_ne(x,0)) variants suggested by Sanjay.
2021-07-13 13:48:17 +01:00
Simon Pilgrim
920f3945dc [X86][SSE] X86ISD::FSETCC nodes (cmpss/cmpsd) return a 0/-1 allbits signbits result (REAPPLIED)
Annoyingly, i686 cmpsd handling still fails to remove the unnecessary neg(and(x,1))

Reapplied rGe4aa6ad13216 with fix for intrinsic variants of the opcode which uses a vector return type
2021-07-13 12:31:09 +01:00
Hafiz Abid Qadeer
5ad78e6343 [AMDGPU] Handle s_branch to another section.
Currently, if target of s_branch instruction is in another section, it will fail with the error of undefined label.  Although in this case, the label is not undefined but present in another section. This patch tries to handle this issue. So while handling fixup_si_sopp_br fixup in getRelocType, if the target label is undefined we issue an error as before. If it is defined, a new relocation type R_AMDGPU_REL16 is returned.

This issue has been reported in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100181 and https://bugs.llvm.org/show_bug.cgi?id=45887. Before https://reviews.llvm.org/D79943, we used to get an crash for this scenario. The crash is fixed now but the we still get an undefined label error.  Jumps to other section can arise with hold/cold splitting.

A patch to handle the relocation in lld will follow shortly.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D105760
2021-07-13 12:17:47 +01:00
Simon Pilgrim
b6e292416f [X86][SSE] Add signbit tests to show cmpss/cmpsd intrinsics not recognised as 'allbits' results.
This adds test coverage for the crash reported on rGe4aa6ad13216
2021-07-13 11:25:52 +01:00
Tim Northover
17974afd26 Support: reduce stack used in default size test.
When the sanitizers aren't enabled they can use more than 1KB of stack, causing
an overflow where there shouldn't be.

Should fix Green Dragon test.
2021-07-13 11:24:12 +01:00
Sebastian Neubauer
1faa7a9983 [AMDGPU] Optimize VGPR LiveRange in waterfall loops
The loops are run exactly once per lane, so VGPRs do not need to be
saved. Use the SIOptimizeVGPRLiveRange pass to add phi nodes that take
undef when coming from the loop.

There is still a shortcoming:
Return values from a function call in the loop are copied because their
live range conflicts with the live range of arguments, even if arguments
are only IMPLICIT_DEF after the phi insertion.

Differential Revision: https://reviews.llvm.org/D105192
2021-07-13 12:15:08 +02:00
Sebastian Neubauer
9f94b4e5f6 [AMDGPU] Mark waterfall loops as SI_WATERFALL_LOOP
This way, they can be detected later, e.g. by the
SIOptimizeVGPRLiveRange pass.

Differential Revision: https://reviews.llvm.org/D105467
2021-07-13 12:15:08 +02:00
Tim Northover
ab18a3752b AArch64: use 4-byte slots for arm64_32 pointers in a tail call 2021-07-13 11:08:59 +01:00
Fraser Cormack
d906274eab [RISCV] Pass undef VECTOR_SHUFFLE indices on to BUILD_VECTOR
Often when lowering vector shuffles, we split the shuffle into two
LHS/RHS shuffles which are then blended together. To do so we split the
original indices into two, indexed into each respective vector. These
two index vectors are then separately lowered as BUILD_VECTORs.

This patch forwards on any undef indices to the BUILD_VECTOR, rather
than having the VECTOR_SHUFFLE lowering decide on an optimal concrete
index. The motiviation for ths change is so that we don't duplicate
optimization logic between the two lowering methods and let BUILD_VECTOR
do what it does best.

Propagating undef in this way allows us, for example, to generate
`vid.v` to produce the LHS indices of commonly-used interleave-type
shuffles. I have designs on further optimizing interleave-type and other
common shuffle patterns in the near future.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D104789
2021-07-13 10:41:54 +01:00
Jeroen Dobbelaere
9fe5660816 [remangleIntrinsicFunction] Detect and resolve name clash
It is possible that the remangled name for an intrinsic already exists with a different (and wrong) prototype within the module.
As the bitcode reader keeps both versions of all remangled intrinsics around for a longer time, this can result in a
crash, as can be seen in https://bugs.llvm.org/show_bug.cgi?id=50923

This patch makes 'remangleIntrinsicFunction' aware of this situation. When it is detected, it moves the version with the wrong prototype to a different name. That version will be removed anyway once the module is completely loaded.

With thanks to @asbirlea for reporting this issue when trying out an lto build with the full restrict patches, and @efriedma for suggesting a sane resolution mechanism.

Reviewed By: apilipenko

Differential Revision: https://reviews.llvm.org/D105118
2021-07-13 11:21:12 +02:00
Jeroen Dobbelaere
c2599a2d12 [NFC] Do not track calls to inlined intrinsics in IFI.
Just like intrinsics are not tracked for IFI.InlinedCalls, they should not be tracked for IFI.InlinedCallSites.

In the current top-of-tree this change is a NFC, but the full restrict patches (D68484) potentially trigger an read-after-free
if intrinsics are also added to the InlindeCallSites, due to a late optimization potentially removing some of the inlined intrinsics.

Also see https://lists.llvm.org/pipermail/llvm-dev/2021-July/151722.html for a discussion about the problem.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D105805
2021-07-13 10:18:23 +02:00
Qiu Chaofan
b88bbb316a [SelectionDAG] Check use before combining into USUBSAT
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D105789
2021-07-13 14:50:26 +08:00
Stanislav Mekhanoshin
327d625bef [AMDGPU] Make some VOP1 instructions rematerializable
This is a pilot change to verify the logic. The rest will be
done in a same way, at least the rest of VOP1.

Differential Revision: https://reviews.llvm.org/D105742
2021-07-12 23:43:45 -07:00
Qiu Chaofan
4bca2e2f76 [PowerPC] Fix typo in vector shuffle combining
a22ecb4 fixed a crash on big endian subtargets. This commit fixes a typo
in that commit which may cause miscompile.
2021-07-13 14:35:47 +08:00
hyeongyu kim
a52de99abc [SimplifyCFG] Fix SimplifyBranchOnICmpChain to be undef/poison safe.
This patch fixes the problem of SimplifyBranchOnICmpChain that occurs
when extra values are Undef or poison.

Suppose the %mode is 51 and the %Cond is poison, and let's look at the
case below.
```
	%A = icmp ne i32 %mode, 0
	%B = icmp ne i32 %mode, 51
	%C = select i1 %A, i1 %B, i1 false
	%D = select i1 %C, i1 %Cond, i1 false
	br i1 %D, label %T, label %F
=>
	br i1 %Cond, label %switch.early.test, label %F
switch.early.test:
	switch i32 %mode, label %T [
		i32 51, label %F
		i32 0, label %F
	]
```
incorrectness: https://alive2.llvm.org/ce/z/BWScX

Code before transformation will not raise UB because %C and %D is false,
and it will not use %Cond. But after transformation, %Cond is being used
immediately, and it will raise UB.

This problem can be solved by adding freeze instruction.

correctness: https://alive2.llvm.org/ce/z/x9x4oY

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D104569
2021-07-13 15:35:18 +09:00
David Green
bae4354177 [MIPS] Regenerate test after D105161. NFC 2021-07-13 07:31:22 +01:00
David Green
78d2c33a9a [ARM] Introduce MVEEXT ISel lowering
Similar to D91921 (and D104515) this introduces two MVESEXT and MVEZEXT
nodes that larger-than-legal sext and zext are lowered to. These either
get optimized away or end up becoming a series of stack loads/store, in
order to perform the extending whilst keeping the order of the lanes
correct. They are generated from v8i16->v8i32, v16i8->v16i16 and
v16i8->v16i32 extends, potentially with a intermediate extend for the
larger v16i8->v16i32 extend. A number of combines have been added for
obvious cases that come up in tests, notably MVEEXT of shuffles. More
may be needed in the future, but this seems to cover most of the cases
that come up in the tests.

Differential Revision: https://reviews.llvm.org/D105090
2021-07-13 07:21:20 +01:00
hyeongyu kim
7ec8c20526 [NFC] Edit the comment in M68kInstrInfo::ExpandMOVSZX_RM 2021-07-13 15:10:24 +09:00
Chen Zheng
be560ab706 [PowerPC][NFC] add test case for preparing more loads/stores 2021-07-13 06:00:19 +00:00
Vitaly Buka
790e45b85d Revert "[X86][SSE] X86ISD::FSETCC nodes (cmpss/cmpsd) return a 0/-1 allbits signbits result"
Fails here https://lab.llvm.org/buildbot/#/builders/37/builds/5267

This reverts commit e4aa6ad132164839a4a97dff0d433ea4766f77f1.
2021-07-12 22:26:54 -07:00
Jessica Paquette
a50fc460b2 [GlobalISel] Handle more types in narrowScalar for eq/ne G_ICMP
Generalize the existing eq/ne case using `extractParts`. The original code only
handled narrowings for types of width 2n->n. This generalization allows for any
type that can be broken down by `extractParts`.

General overview is:

- Loop over each narrow-sized part and do exactly what the 2-register case did.
- Loop over the leftover-sized parts and do the same thing
- Widen the leftover-sized XOR results to the desired narrow size
- OR that all together and then do the comparison against 0 (just like the old
  code)

This shows up a lot when building clang for AArch64 using GlobalISel, so it's
worth fixing. For the sake of simplicity, this doesn't handle the non-eq/ne
case yet.

Also remove the code in this case that notifies the observer; we're just going
to delete MI anyway so talking to the observer shouldn't be necessary.

Differential Revision: https://reviews.llvm.org/D105161
2021-07-12 22:18:50 -07:00
Arthur Eubanks
e83128ce64 [OpaquePtr][ISel] Use ArgListEntry::IndirectType more 2021-07-12 21:14:35 -07:00
Arthur Eubanks
59af7a4d0f [OpaquePointers][ThreadSanitizer] Cleanup calls to PointerType::getElementType()
Reviewed By: #opaque-pointers, dblaikie

Differential Revision: https://reviews.llvm.org/D105710
2021-07-12 20:46:08 -07:00
David Blaikie
ff9d1e8f1a Fix test - mistaken hardcoded path from my local machine. 2021-07-12 18:39:41 -07:00
David Blaikie
1a643380ca DebugInfo: Use debug_rnglists.dwo for ranges in debug_info.dwo when parsing DWARFv5
This call would incorrectly overwrite (with the .debug_rnglists.dwo from
the executable, if there was one) the rnglists section instead of the
correct value (from the .debug_rnglists.dwo in the .dwo file) that's
applied in DWARFUnit::tryExtractDIEsIfNeeded
2021-07-12 18:15:09 -07:00
Fangrui Song
99778149bd [llc] Default MCUseDwarfDirectory to true
For Clang, `MCUseDwarfDirectory` is true by default for the majority cases
(-fintegrated-as or -gdwarf-5; most targets use -fintegrated-as by default).
Defaulting MCUseDwarfDirectory to true can reduce the differences between clang
and llc.

Reviewed By: #debug-info, dblaikie

Differential Revision: https://reviews.llvm.org/D105856
2021-07-12 17:44:02 -07:00
Jon Roelofs
87b700cb10 [AArch64] Dump a little more info about unimplemented reg-to-reg copies. NFC 2021-07-12 15:37:11 -07:00
Eli Friedman
9a07cb7b00 [AArch64] Optimize overflow checks for [s|u]mul.with.overflow.i32.
Saves one instruction for signed, uses a cheaper instruction for
unsigned.

Differential Revision: https://reviews.llvm.org/D105770
2021-07-12 15:30:42 -07:00
Eli Friedman
9b52d79836 [SelectionDAG][RISCV] Support @llvm.vscale.i64() on 32-bit targets.
Not really useful on its own, but D105673 depends on it.

Differential Revision: https://reviews.llvm.org/D105840
2021-07-12 14:53:42 -07:00
Amy Kwan
28961182b7 [PowerPC] Fix the splat immediate in PPCMIPeephole depending on if we have an Altivec and VSX splat instruction.
An assertion of the following can occur because Altivec and VSX splats use a different operand number for the immediate:
```
int64_t llvm::MachineOperand::getImm() const: Assertion `isImm() && "Wrong MachineOperand accessor"' failed.
```
This patch updates PPCMIPeephole.cpp assign the correct splat immediate.

Differential Revision: https://reviews.llvm.org/D105790
2021-07-12 16:20:11 -05:00
Nikita Popov
495f2550b0 [Attributes] Determine attribute properties from TableGen data
Continuing from D105763, this allows placing certain properties
about attributes in the TableGen definition. In particular, we
store whether an attribute applies to fn/param/ret (or a combination
thereof). This information is used by the Verifier, as well as the
ForceFunctionAttrs pass. I also plan to use this in LLParser,
which also duplicates info on which attributes are valid where.

This keeps metadata about attributes in one place, and makes it
more likely that it stays in sync, rather than in various
functions spread across the codebase.

Differential Revision: https://reviews.llvm.org/D105780
2021-07-12 22:13:38 +02:00
Wouter van Oortmerssen
714dda98bc [WebAssembly] fix typo in range check for Asm locals 2021-07-12 13:07:11 -07:00
Nikita Popov
d0157d4fd5 [Attributes] Remove duplicate attribute in typeIncompatible() (NFC)
InAlloca was listed twice, once as a normal attribute, once as a
type attribute.
2021-07-12 21:59:29 +02:00
Nikita Popov
2812298c44 [Attributes] Replace doesAttrKindHaveArgument() (NFC)
This is now the same as isIntAttrKind(), so use that instead, as
it does not require manual maintenance. The naming is also more
accurate in that both int and type attributes have an argument,
but this method was only targeting int attributes.

I initially wanted to tighten the AttrBuilder assertion, but we
have some in-tree uses that would violate it.
2021-07-12 21:57:26 +02:00
Simon Pilgrim
436c202090 [CostModel][X86] Adjust fptosi/fptoui SSE/AVX legalized costs based on llvm-mca reports.
Update (mainly) vXf32/vXf64 -> vXi8/vXi16 fptosi/fptoui costs based on the worst case costs from the script in D103695.

Move to using legalized types wherever possible, which allows us to prune the cost tables.
2021-07-12 20:38:25 +01:00
Thomas Johnson
76438a2e53 [ARC] Add disassembly for the conditioned move immediate instruction
This change is a step towards implementing codegen for __builtin_clz().
Full support for CLZ with a regression test will follow shortly.
Differential Revision: https://reviews.llvm.org/D105560
2021-07-12 12:35:56 -07:00
Nikita Popov
7664c93484 [Attributes] Simplify attribute sorting (NFCI)
It's not necessary to explicitly sort by enum/int/type attribute,
as the attribute kinds are already sorted this way. We can directly
sort by kind.
2021-07-12 21:11:59 +02:00
Nikita Popov
4966784718 [Attributes] Assert correct attribute constructor is used (NFCI)
Assert that enum/int/type attributes go through the constructor
they are supposed to use.

To make sure this can't happen via invalid bitcode, explicitly
verify that the attribute kind if correct there.
2021-07-12 21:11:59 +02:00
Krzysztof Drewniak
633cb08d11 Add newline to fix documentation build
Reviewed By: xgupta

Differential Revision: https://reviews.llvm.org/D105825
2021-07-12 19:00:58 +00:00
Nikita Popov
28e27a194e [Attributes] Make type attribute handling more generic (NFCI)
Followup to D105658 to make AttrBuilder automatically work with
new type attributes. TableGen is tweaked to emit First/LastTypeAttr
markers, based on which we can handle type attributes
programmatically.

Differential Revision: https://reviews.llvm.org/D105763
2021-07-12 20:49:38 +02:00
Jinsong Ji
b94c45d19b [PowerPC] Custom Lowering BUILD_VECTOR for v2i64 for P7 as well
The lowering for v2i64 is now guarded with hasDirectMove,
however, the current lowering can handle the pattern correctly,
only lowering it when there is efficient patterns and corresponding
instructions.

The original guard was added in D21135, and was for Legal action.
The code has evloved now, this guard is not necessary anymore.

Reviewed By: #powerpc, nemanjai

Differential Revision: https://reviews.llvm.org/D105596
2021-07-12 17:56:10 +00:00
Thomas Lively
eb7eabd7be [WebAssembly] Custom combines for f32x4.demote_zero_f64x2
Replace the clang builtin function and LLVM intrinsic for
f32x4.demote_zero_f64x2 with combines from normal SDNodes. Also add missing
combines for i32x4.trunc_sat_zero_f64x2_{s,u}, which share the same pattern.

Differential Revision: https://reviews.llvm.org/D105755
2021-07-12 10:32:18 -07:00
Craig Topper
01311414db [X86] Teach X86FloatingPoint's handleCall to only erase the FP stack if there is a regmask operand that clobbers the FP stack.
There are some calls to functions like `__alloca` that are missing
a regmask operand. Lack of a regmask operand means that all
registers that aren't mentioned by def operands are preserved.
__alloca only updates EAX and ESP and has def operands for
them so this is ok. Because there is no regmask the register
allocator won't spill the FP registers across the call. Assuming
we want to keep the FP stack untoched across these calls, we
need to handle this is in the FP stackifier.

We might want to add a proper regmask operand to the code that
creates these calls to indicate all registers are preserved, but we'd
still need this change to the FP stackifier to know to preserve the
FP stack for such a regmask.

The test is kind of long, but bugpoint wasn't able to reduce it
any further.

Fixes PR50782

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D105762
2021-07-12 10:15:38 -07:00
Fangrui Song
601cf9f2c4 [llvm-readobj] Switch command line parsing from llvm::cl to OptTable
Users should generally observe no difference as long as they don't use
unintended option forms. Behavior changes:

* `-t=d` is removed. Use `-t d` instead.
* `--demangle=false` and `--demangle=0` cannot be used. Omit the option or use `--no-demangle`. Other flag-style options don't have `--no-` forms.
* `--help-list` is removed. This is a `cl::` specific option.
* llvm-readobj now supports grouped short options as well.
* `--color` is removed. This is generally not useful (only apply to errors/warnings) but was inherited from Support.

Some adjustment to the canonical forms
(usually from GNU readelf; currently llvm-readobj has too many redundant aliases):

* --dyn-syms is canonical. --dyn-symbols is a hidden alias
* --file-header is canonical. --file-headers is a hidden alias
* --histogram is canonical. --elf-hash-histogram is a hidden alias
* --relocs is canonical. --relocations is a hidden alias
* --section-groups is canonical. --elf-section-groups is a hidden alias

OptTable avoids global option collision if we decide to support multiplexing for binary utilities.

* Most one-dash long options are still supported. `-dt, -sd, -st, -sr` are dropped due to their conflict with grouped short options.
* `--section-mapping=false` (D57365) is strange but is kept for now.
* Many `cl::opt` variables were unnecessarily external. I added `static` whenever appropriate.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D105532
2021-07-12 10:14:42 -07:00
Fangrui Song
b30422e336 [test] Move AMDGPU reloc test from Object to tools/llvm-readobj and simplify it
We already have some reloc-types-elf-*.test serving the similar purpose.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D105783
2021-07-12 10:04:32 -07:00