1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-18 18:42:46 +02:00
Commit Graph

205643 Commits

Author SHA1 Message Date
Simon Pilgrim
8b0d7fb015 Fix some signed/unsigned comparison gcc warnings from D87930 2020-10-24 12:51:51 +01:00
Simon Pilgrim
68e9e16c89 [InstCombine] narrowFunnelShift - fold trunc/zext or(shl(a,x),lshr(b,sub(bw,x))) -> fshl(a,b,x) (PR35155)
As discussed on PR35155, this extends narrowFunnelShift (recently renamed from narrowRotate) to support basic funnel shift patterns.

Unlike matchFunnelShift we don't include the computeKnownBits limitation as extracting the pattern from the zext/trunc layers should be a indicator of reasonable funnel shift codegen, in D89139 we demonstrated how to efficiently promote funnel shifts to wider types.

Differential Revision: https://reviews.llvm.org/D89542
2020-10-24 12:42:43 +01:00
Simon Pilgrim
e7c5dcedac [DAG] Add BuildVectorSDNode::getRepeatedSequence helper to recognise multi-element splat patterns
Replace the X86 specific isSplatZeroExtended helper with a generic BuildVectorSDNode method.

I've just used this to simplify the X86ISD::BROADCASTM lowering so far (and remove isSplatZeroExtended), but we should be able to use this in more places to lower to complex broadcast patterns.

Differential Revision: https://reviews.llvm.org/D87930
2020-10-24 12:23:09 +01:00
Simon Pilgrim
fa551c490c [LegalizeTypes] Legalize vector rotate operations
Lower vector rotate operations as long as the legalization occurs outside of LegalizeVectorOps.

This fixes https://bugs.llvm.org/show_bug.cgi?id=47320

Patch By: @rsanthir.quic (Ryan Santhirarajan)

Differential Revision: https://reviews.llvm.org/D89497
2020-10-24 11:30:32 +01:00
Nikita Popov
0ecd25c73f [BasicAA] Avoid duplicate cache lookup (NFCI)
Rather than performing the cache lookup with both possible orders
for the locations, use the same canonicalization as the other
AliasCache lookups in BasicAA.
2020-10-24 10:19:02 +02:00
Nikita Popov
ac22411ea8 [BasicAA] Fix caching in the presence of phi cycles
Any time we insert a block into VisitedPhiBBs, previously cached
values may no longer be valid for the recursive alias queries. As
such, perform them using an empty AAQueryInfo.

Note that if we recurse to the same phi, the block will already
be inserted, so we reuse the old AAQueryInfo, and thus still
protect against infinite recursion.

This problem can appear with with an without BatchAA, but is more
likely to occur with BatchAA, as more values are cached.

Differential Revision: https://reviews.llvm.org/D90066
2020-10-24 09:58:02 +02:00
Jonas Paulsson
0d62b14425 [SystemZ] Define MaxInstLength to have the value of 6.
This value had the default value of 4 which caused branch relaxation to fail.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D90065
2020-10-24 09:19:34 +02:00
Tony
a12729cc2b [AMDGPU] Cleanup AMDGPUUsage.rst
- Layout and typo improvements.
- Add memory spaces section.
- reStructure syntax fixes.

Differential Revision: https://reviews.llvm.org/D90002
2020-10-24 06:21:27 +00:00
Med Ismail Bennani
b398ab2619 [llvm/DebugInfo] Emit DW_OP_implicit_value when tuning for LLDB
This patch enables emitting DWARF `DW_OP_implicit_value` opcode when
tuning debug information for LLDB (`-debugger-tune=lldb`).

This will also propagate to Darwin platforms, since they use LLDB tuning
as a default.

rdar://67406059

Differential Revision: https://reviews.llvm.org/D90001

Signed-off-by: Med Ismail Bennani <medismail.bennani@gmail.com>
2020-10-24 06:45:33 +02:00
Zequan Wu
243529579b [llvm-cov] don't include all source files when provided source files are filtered out
When all provided source files are filtered out either due to `--ignore-filename-regex` or not part of binary, don't generate coverage reults for all source files. Because if users want to generate coverage results for all source files, they don't even need to provid selected source files or `--ignore-filename-regex`.

Differential Revision: https://reviews.llvm.org/D89359
2020-10-23 19:32:16 -07:00
David Blaikie
174725a4ae llvm-dwarfdump: Support verbose printing DW_OP_convert to print the CU local offset before the resolved absolute offset 2020-10-23 18:50:15 -07:00
Hongtao Yu
243bd8f652 [AutoFDO] Remove a broken assert in merging inlinee samples
Duplicated callsites share the same callee profile if the original callsite was inlined. The sharing also causes the profile of callee's callee to be shared. This breaks the assert introduced ealier by D84997 in a tricky way.

To illustrate, I'm using an abstract example. Say we have three functions `A`, `B` and `C`. A calls B twice and B calls C once. Some optimize performed prior to the sample profile loader duplicates first callsite to `B` and the program may look like

```
A()
{
  B();  // with nested profile B1 and C1
  B();  // duplicated, with nested profile B1 and C1
  B();  // with nested profile B2 and C2
}
```

For some reason, the sample profile loader inliner then decides to only inline the first callsite in `A` and transforms `A` into

```
A()
{
  C();  // with nested profile C1
  B();  // duplicated, with nested profile B1 and C1
  B();  // with nested profile B2 and C2.
}
```

Here is what happens next:

	1. Failing to inline the callsite `C()` results in `C1`'s samples returned to `C`'s base (outlined) profile. In the meantime, `C1`'s head samples are updated to `C1`'s entry sample. This also affects the profile of the middle callsite which shares `C1` with the first callsite.
	2. Failing to inline the middle callsite results in `B1` returned to `B`'s base profile, which in turn will cause `C1` merged into `B`'s base profile. Note that the nest `C` profile in `B`'s base has a non-zero head sample count now. The value actually equals to `C1`'s entry count.
	3. Failing to inline last callsite results in `B2` returned to `B`'s base profile. Note that the nested `C` profile in `B`'s base now has an entry count equal to the sum of that of `C1` and `C2`, with the head count equal to that of `C1`. This will trigger the assert later on.
        4. Compiling `B` using `B`'s base profile. Failing to inline `C` there triggers the returning of the nested `C` profile. Since the nested `C` profile has a non-zero head count, the returning doesn't go through. Instead, the assert goes off.

It's good that `C1` is only returned once, based on using a non-zero head count to ensure an inline profile is only returned once. However C2 is never returned. While it seems hard to solve this perfectly within the current framework, I'm just removing the broken assert. This should be reasonably fixed by the upcoming CSSPGO work where counts returning is based on context-sensitivity and a distribution factor for callsite probes.

The simple example is extracted from one of our internal services. In reality, why the original callsite `B()` and duplicate one having different inline behavior is a magic. It has to do with imperfect counts in profile and extra complicated inlining that makes the hotness for them different.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D90056
2020-10-23 17:42:21 -07:00
Mehdi Amini
d01d6f8450 Remove unused verifyRegStateMapping() function in RegAllocFast (NFC)
This fixes compiler warning when building with assertions.
2020-10-24 00:36:51 +00:00
Krzysztof Parzyszek
28d3c031c0 [Hexagon] Handle selection between HVX vector predicates
Make sure that (select i1 q0 q1) is handled properly.
2020-10-23 18:22:03 -05:00
Arthur Eubanks
77ffdb3e3c [StructurizeCFG][NewPM] Port -structurizecfg to NPM
This doesn't support -structurizecfg-skip-uniform-regions since that
would require porting LegacyDivergenceAnalysis.

The NPM doesn't support adding a non-analysis pass as a dependency of
another, so I had to add -lowerswitch to some tests or pin them to the
legacy PM.

This is the only RegionPass in tree, so I simply copied the logic for
finding all Regions from the legacy PM's RGManager into
StructurizeCFG::run().

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D89026
2020-10-23 15:54:03 -07:00
Arthur Eubanks
2cbef9de5f [Inliner][NPM] Properly pass callee AAResults
Fixes noalias-calls.ll under NPM.

Differential Revision: https://reviews.llvm.org/D89592
2020-10-23 15:37:18 -07:00
Arthur Eubanks
40e31d2d13 [test] Simplify pr33641_remove_arg_dbgvalue.ll
This makes it pass under the NPM.
The legacy PM pass ran passes on SCCs in a different order, causing
argpromotion to not trigger on @bar().

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D89889
2020-10-23 15:36:05 -07:00
Keith Smiley
63e3d161d6 [llvm-install-name-tool] Add -prepend_rpath option
This diff adds the option -prepend_rpath which inserts an rpath as
the first rpath in the binary.

Test plan: make check-all

Differential revision: https://reviews.llvm.org/D89605
2020-10-23 15:01:03 -07:00
Evandro Menezes
809a4483be [RISCV] Use the commercial name for scheduling model (NFC)
Use the commercial name for the scheduling model for the SiFive 7 Series.
2020-10-23 16:33:27 -05:00
Cameron McInally
94734c2c54 [SVE] Lower fixed length VECREDUCE_SEQ_FADD operation
Differential Revision: https://reviews.llvm.org/D89162
2020-10-23 16:24:02 -05:00
Artur Pilipenko
31af2fa7ed GC-parseable element atomic memcpy/memmove
This change introduces a GC parseable lowering for element atomic
memcpy/memmove intrinsics. This way runtime can provide an
implementation which can take a safepoint during copy operation.

See "GC-parseable element atomic memcpy/memmove" thread on llvm-dev
for the background and details:
https://groups.google.com/g/llvm-dev/c/NnENHzmX-b8/m/3PyN8Y2pCAAJ

Differential Revision: https://reviews.llvm.org/D88861
2020-10-23 14:06:09 -07:00
Michael Liao
59287b0fc6 Fix shared build. NFC. 2020-10-23 15:53:05 -04:00
Florian Hahn
9f772c5b7d [AArch64] Add vector compare/select cost-model tests. 2020-10-23 20:43:04 +01:00
Geoffrey Martin-Noble
91a4ce9865 Unconditionally #include <future>
This unbreaks building with `LLVM_ENABLE_THREADS=0`. Since
https://github.com/llvm/llvm-project/commit/069919c9ba33 usage of
`std::promise` is not guarded by `LLVM_ENABLE_THREADS`, so this header
must be unconditionally included.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D89758
2020-10-23 19:17:37 +00:00
Arthur Eubanks
d924016b11 [gn build] Add missing comma 2020-10-23 12:01:23 -07:00
Nick Desaulniers
e95a065d26 [IR] add fn attr for no_stack_protector; prevent inlining on mismatch
It's currently ambiguous in IR whether the source language explicitly
did not want a stack a stack protector (in C, via function attribute
no_stack_protector) or doesn't care for any given function.

It's common for code that manipulates the stack via inline assembly or
that has to set up its own stack canary (such as the Linux kernel) would
like to avoid stack protectors in certain functions. In this case, we've
been bitten by numerous bugs where a callee with a stack protector is
inlined into an __attribute__((__no_stack_protector__)) caller, which
generally breaks the caller's assumptions about not having a stack
protector. LTO exacerbates the issue.

While developers can avoid this by putting all no_stack_protector
functions in one translation unit together and compiling those with
-fno-stack-protector, it's generally not very ergonomic or as
ergonomic as a function attribute, and still doesn't work for LTO. See also:
https://lore.kernel.org/linux-pm/20200915172658.1432732-1-rkir@google.com/
https://lore.kernel.org/lkml/20200918201436.2932360-30-samitolvanen@google.com/T/#u

Typically, when inlining a callee into a caller, the caller will be
upgraded in its level of stack protection (see adjustCallerSSPLevel()).
By adding an explicit attribute in the IR when the function attribute is
used in the source language, we can now identify such cases and prevent
inlining.  Block inlining when the callee and caller differ in the case that one
contains `nossp` when the other has `ssp`, `sspstrong`, or `sspreq`.

Fixes pr/47479.

Reviewed By: void

Differential Revision: https://reviews.llvm.org/D87956
2020-10-23 11:55:39 -07:00
Stanislav Mekhanoshin
270ab79b5a [AMDGPU] Fixed isLegalRegOperand() with physregs
This does not change anything at the moment, but needed for
D89170. In that change I am probing a physical SGPR to see if
it is legal. RC is SReg_32, but DRC for scratch instructions
is SReg_32_XEXEC_HI and test fails.

That is sufficient just to check if DRC contains a register
here in case of physreg. Physregs also do not use subregs
so the subreg handling below is irrelevant for these.

Differential Revision: https://reviews.llvm.org/D90064
2020-10-23 11:33:34 -07:00
Hubert Tong
528517aa74 [AIX][cmake] Adjust management of -G for linking
The change in 0ba98433971f changed the behaviour of the build when
using an XL build compiler because `-G` is not a pure linker option:
it also implies `-shared`. This was accounted for in the base CMake
configuration, so an analysis of the change from 0ba98433971f in
relation to a build using Clang (where `-shared` is introduced by CMake)
would not identify the issue. This patch resolves this particular issue
by adding `-shared` alongside `-Wl,-G`.

At the same time, the investigation reveals that several aspects of the
various build configurations are not operating in the manner originally
intended.

The other issue related to the `-G` linker option in the build is that
the removal of it (to avoid unnecessary use of run-time linking) is not
effective for the build using the Clang compiler. This patch addresses
this by adjusting the regular expressions used to remove the broadly-
applied `-G`.

Finally, the issue of specifying the export list with `-Wl,` instead of
a compiler option is flagged with a FIXME comment.

Reviewed By: daltenty, amyk

Differential Revision: https://reviews.llvm.org/D90041
2020-10-23 14:32:36 -04:00
Nikita Popov
88e55394ae [BasicAA] Add additional phi cycle test (NFC)
This is a variation of the BatchAA problem that also applies
without BatchAA. We may have a cached result from earlier in
the same query.
2020-10-23 20:31:20 +02:00
Mircea Trofin
fbc18d8c7b [NFC] Use [MC]Register in RegAllocGreedy
This was initiated from the uses of MCRegUnitIterator, so while likely
not exhaustive, it's a step forward.

Differential Revision: https://reviews.llvm.org/D89975
2020-10-23 11:30:53 -07:00
Baptiste Saleil
e248116f2c [PowerPC] Add intrinsics for MMA
This patch adds support for MMA intrinsics.

Authored by: Baptiste Saleil

Reviewed By: #powerpc, bsaleil, amyk

Differential Revision: https://reviews.llvm.org/D89345
2020-10-23 13:16:02 -05:00
Nikita Popov
bac894562a [PhiValues] Use SetVector to avoid non-determinism
I'm not sure whether this can cause actual non-determinism in the
compiler output, but at least it causes non-determinism in the
statistics collected by BasicAA.

Use SetVector to have a predictable iteration order.
2020-10-23 20:14:02 +02:00
Mircea Trofin
55c9f68e17 [MLInliner] Disable always inliner in bounds tests
That changes the threshold calculation.
2020-10-23 10:24:51 -07:00
Amara Emerson
a5ff88d2db [AArch64][GlobalISel] Introduce a new post-isel optimization pass.
There are two optimizations here:

1. Consider the following code:
 FCMPSrr %0, %1, implicit-def $nzcv
 %sel1:gpr32 = CSELWr %_, %_, 12, implicit $nzcv
 %sub:gpr32 = SUBSWrr %_, %_, implicit-def $nzcv
 FCMPSrr %0, %1, implicit-def $nzcv
 %sel2:gpr32 = CSELWr %_, %_, 12, implicit $nzcv
This kind of code where we have 2 FCMPs each feeding a CSEL can happen
when we have a single IR fcmp being used by two selects. During selection,
to ensure that there can be no clobbering of nzcv between the fcmp and the
csel, we have to generate an fcmp immediately before each csel is
selected.

However, often we can essentially CSE these together later in MachineCSE.
This doesn't work though if there are unrelated flag-setting instructions
in between the two FCMPs. In this case, the SUBS defines NZCV
but it doesn't have any users, being overwritten by the second FCMP.

Our solution here is to try to convert flag setting operations between
a interval of identical FCMPs, so that CSE will be able to eliminate one.

2. SelectionDAG imported patterns for arithmetic ops currently select the
flag-setting ops for CSE reasons, and add the implicit-def $nzcv operand
to those instructions. However if those impdef operands are not marked as
dead, the peephole optimizations are not able to optimize them into non-flag
setting variants. The optimization here is to find these dead imp-defs and
mark them as such.

This pass is only enabled when optimizations are enabled.

Differential Revision: https://reviews.llvm.org/D89415
2020-10-23 10:18:36 -07:00
LLVM GN Syncbot
df8c67668f [gn build] Port dbbc4f4e226 2020-10-23 17:06:41 +00:00
Arthur Eubanks
44db41329f Revert "[CGSCC] Detect devirtualization in more cases"
This reverts commit 3024fe5b55ed72633915f613bd5e2826583c396f.

Causes major compile time regressions:
https://llvm-compile-time-tracker.com/compare.php?from=3b8d8954bf2c192502d757019b9fe434864068e9&to=3024fe5b55ed72633915f613bd5e2826583c396f&stat=instructions
2020-10-23 09:53:52 -07:00
Alex Orlov
41781efce1 Added utility to launch tests on a target remotely.
Runs an executable on a remote host.
This is meant to be used as an executor when running the LLVM and the Libraries tests on a target.

Reviewed By: vvereschaka

Differential Revision: https://reviews.llvm.org/D89349
2020-10-23 20:52:30 +04:00
Lang Hames
a156cf61cb Re-apply "[JITLink][ELF] Add support for ELF::R_X86_64_REX_GOTPCRELX relocation"
This re-applies e2fceec2fd1 with fixes. Apparently we already *do* support
relaxation for ELF, so we need to make sure the test case allocates a slab at
a fixed address, and that the R_X86_64_REX_GOTPCRELX test references an external
that is guaranteed to be out of range.
2020-10-23 09:48:05 -07:00
Huihui Zhang
f5744161af [AArch64][SVE] Fix umin/umax lowering to handle out of range imm.
Immediate must be in an integer range [0,255] for umin/umax instruction.
Extend pattern matching helper SelectSVEArithImm() to take in value type
bitwidth when checking immediate value is in range or not.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D89831
2020-10-23 09:42:56 -07:00
Victor Huang
e4448695d1 [PowerPC] Fix the Predicates for enabling pcrelative-memops and PLXVP/PSTXVP definitions
In this patch, Predicates fix added for the following:
* disable prefix-instrs will disable pcrelative-memops
* set two predicates PairedVectorMemops and PrefixInstrs for PLXVP/PSTXVP definitions

Differential Revision: https://reviews.llvm.org/D89727
Reviewed by: amyk, steven.zhang
2020-10-23 11:33:20 -05:00
LLVM GN Syncbot
5b19aaf8b1 [gn build] Port 00255f41929 2020-10-23 16:19:55 +00:00
vpykhtin
9d887381ef [AMDGPU] Fix access beyond the end of the basic block in execMayBeModifiedBeforeAnyUse.
I was wrong in thinking that MRI.use_instructions return unique instructions and mislead Jay in his previous patch D64393.

First loop counted more instructions than it was in reality and the second loop went beyond the basic block with that counter.

I used Jay's previous code that relied on MRI.use_operands to constrain the number of instructions to check among.
modifiesRegister is inlined to reduce the number of passes over instruction operands and added assert on BB end boundary.

Differential Revision: https://reviews.llvm.org/D89386
2020-10-23 19:17:48 +03:00
Paulo Matos
f7fc888024 [WebAssembly] Implementation of (most) table instructions
Implementation of instructions table.get, table.set, table.grow,
table.size, table.fill, table.copy.

Missing instructions are table.init and elem.drop as they deal with
element sections which are not yet implemented.

Added more tests to tables.s

Differential Revision: https://reviews.llvm.org/D89797
2020-10-23 08:42:54 -07:00
Jeremy Morse
0c18b2fb0d [DebugInstrRef] Handle DBG_INSTR_REFs use-before-defs in LiveDebugValues
Deciding where to place debugging instructions when normal instructions
sink between blocks is difficult -- see PR44117. Dealing with this with
instruction-referencing variable locations is simple: we just tolerate
DBG_INSTR_REFs referring to values that haven't been computed yet. This
patch adds support into InstrRefBasedLDV to record when a variable value
appears in the middle of a block, and should have a DBG_VALUE added when it
appears (a debug use before def).

While described simply, this relies heavily on the value-propagation
algorithm in InstrRefBasedLDV. The implementation doesn't attempt to verify
the location of a value unless something non-trivial occurs to merge
variable values in vlocJoin. This means that a variable with a value that
has no location can retain it across all control flow (including loops).
It's only when another debug instruction specifies a different variable
value that we have to check, and find there's no location.

This property means that if a machine value is defined in a block dominated
by a DBG_INSTR_REF that refers to it, all the successor blocks can
automatically find a location for that value (if it's not clobbered). Thus
in a sense, InstrRefBasedLDV is already supporting and implementing
use-before-defs. This patch allows us to specify a variable location in the
block where it's defined.

When loading live-in variable locations, TransferTracker currently discards
those where it can't find a location for the variable value. However, we
can tell from the machine value number whether the value is defined in this
block. If it is, add it to a set of use-before-def records. Then, once the
relevant instruction has been processed, emit a DBG_VALUE immediately after
it.

Differential Revision: https://reviews.llvm.org/D85775
2020-10-23 16:33:23 +01:00
Jay Foad
9321aed101 [AMDGPU] Add simplification/combines for llvm.amdgcn.fma.legacy
This follows on from D89558 which added the new intrinsic and D88955
which added similar combines for llvm.amdgcn.fmul.legacy.

Differential Revision: https://reviews.llvm.org/D90028
2020-10-23 16:16:13 +01:00
Denis Antrushin
91be48b03e Revert "[Statepoints] Allow deopt GC pointer on VReg if gc-live bundle is empty."
Downstream testing revealed some problems with this patch.
Reverting while investigating.
This reverts commit 2b96dcebfae65485859d956954f10f409abaae79.
2020-10-23 21:55:06 +07:00
Nicolai Hähnle
12fb986d8e CfgInterface: rename interface() to getInterface()
Apparently there are some Microsoft headers which
`#define interface struct`. This method is only used
in pending changes so far.

Change-Id: Ic68fe8e1958ec9b015f817ee218431f4146b888a
2020-10-23 16:52:10 +02:00
Simon Pilgrim
2c1de86bff [InstCombine] Add i8 bitreverse by multiplication test patterns
Pulled from bit twiddling hacks webpage
2020-10-23 15:39:57 +01:00
Simon Pilgrim
0f2f9e2cf6 [InstCombine] Add 8/16/32/64 bitreverse test coverage
Use typical codegen for the traditional pairwise lgN bitreverse algorithm
2020-10-23 15:39:56 +01:00
Simon Pilgrim
124244322c [InstCombine] Add initial bitreverse test coverage 2020-10-23 15:39:56 +01:00