1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-20 19:42:54 +02:00
Commit Graph

205804 Commits

Author SHA1 Message Date
Benjamin Kramer
1f13ddec12 [X86] Add a stub for Intel's alderlake.
No scheduling, no autodetection.
2020-10-24 19:01:22 +02:00
Benjamin Kramer
0dfb1f2850 [X86] Add a stub for znver3 based on the little public information there is in AMD's manuals
No scheduling, no autodetection. Just enough so -march=znver3 works.
2020-10-24 19:01:22 +02:00
dfukalov
efaecfc60e [AMDGPU][CostModel] Refine cost model for half- and quarter-rate instructions.
1. Throughput and codesize costs estimations was separated and updated.
2. Updated fdiv cost estimation for different cases.
3. Added scalarization processing for types that are treated as !isSimple() to
improve codesize estimation in getArithmeticInstrCost() and
getArithmeticInstrCost(). The code was borrowed from TCK_RecipThroughput path
of base implementation.

Next step is unify scalarization part in base class that is currently works for
TCK_RecipThroughput path only.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D89973
2020-10-24 19:53:08 +03:00
David Green
2c7b4a47cb [ARM] Remove some dead code. NFC 2020-10-24 17:22:49 +01:00
Stefan Gränitz
1b40698654 Reapply "[jitlink][ELF] Add zero-fill blocks for symbols in section SHN_COMMON"
Root cause of the test failure was fixed with:
[JITLink][ELF] PCRel32GOTLoad edge offset can be smaller three

This reverts commit 10b1a61bafba39fd7400a814a7272f41222ad579.
2020-10-24 16:58:06 +02:00
Stefan Gränitz
48e6dace5b [JITLink][ELF] PCRel32GOTLoad edge offset can be smaller three
Offset is 2 for MOVL instruction in test ELF_x86-64_common. This should fix the test failures.

Differential Revision: https://reviews.llvm.org/D89795
2020-10-24 16:57:48 +02:00
TaWeiTu
39d46368c8 [NPM] Port -loop-versioning-licm to NPM
Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D89371
2020-10-24 21:51:18 +08:00
Stefan Gränitz
56002e0053 Revert "[jitlink][ELF] Add zero-fill blocks for symbols in section SHN_COMMON"
This reverts commit e9955b0843cc1e5876430f3f051494d4197419f3. Cannot reproduce the buildbot failures yet. Reverting in the meantime.
2020-10-24 15:43:06 +02:00
TaWeiTu
20376d1131 [LoopVersioning] Form dedicated exits for versioned loop to preserve simplify form
The exit blocks of the versioned and non-versioned loops are not dedicated and thus the two loops are not in simplify form.
Insert dummy exit blocks after loop versioning with `formDedicatedExits()` to preserve the simplify form for subsequence passes.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D89569
2020-10-24 21:40:46 +08:00
Stefan Gränitz
6584d831e6 [jitlink][ELF] Add zero-fill blocks for symbols in section SHN_COMMON
Symbols with special section index SHN_COMMON (0xfff2) haven't been handled so far and caused an invalid section error.

This is a more or less straightforward use of the code commented out at the end of the function. I checked with the ELF spec, that the symbol value gives the alignment.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D89795
2020-10-24 14:54:38 +02:00
Stefan Gränitz
096558bfde [JITLink][ELF] PCRel32GOTLoad relocations are resolved like regular PCRel32 ones
The difference is that the former are indirect and go to the GOT while the latter go to the target directly. This info can be used to relax indirect ones that don't need the GOT (because the target is in range). We check for this optimization beforehand. For formal correctness and to avoid confusion, we should only change the relocation kind if we actually apply the relaxation.
2020-10-24 14:54:38 +02:00
Simon Pilgrim
8b0d7fb015 Fix some signed/unsigned comparison gcc warnings from D87930 2020-10-24 12:51:51 +01:00
Simon Pilgrim
68e9e16c89 [InstCombine] narrowFunnelShift - fold trunc/zext or(shl(a,x),lshr(b,sub(bw,x))) -> fshl(a,b,x) (PR35155)
As discussed on PR35155, this extends narrowFunnelShift (recently renamed from narrowRotate) to support basic funnel shift patterns.

Unlike matchFunnelShift we don't include the computeKnownBits limitation as extracting the pattern from the zext/trunc layers should be a indicator of reasonable funnel shift codegen, in D89139 we demonstrated how to efficiently promote funnel shifts to wider types.

Differential Revision: https://reviews.llvm.org/D89542
2020-10-24 12:42:43 +01:00
Simon Pilgrim
e7c5dcedac [DAG] Add BuildVectorSDNode::getRepeatedSequence helper to recognise multi-element splat patterns
Replace the X86 specific isSplatZeroExtended helper with a generic BuildVectorSDNode method.

I've just used this to simplify the X86ISD::BROADCASTM lowering so far (and remove isSplatZeroExtended), but we should be able to use this in more places to lower to complex broadcast patterns.

Differential Revision: https://reviews.llvm.org/D87930
2020-10-24 12:23:09 +01:00
Simon Pilgrim
fa551c490c [LegalizeTypes] Legalize vector rotate operations
Lower vector rotate operations as long as the legalization occurs outside of LegalizeVectorOps.

This fixes https://bugs.llvm.org/show_bug.cgi?id=47320

Patch By: @rsanthir.quic (Ryan Santhirarajan)

Differential Revision: https://reviews.llvm.org/D89497
2020-10-24 11:30:32 +01:00
Nikita Popov
0ecd25c73f [BasicAA] Avoid duplicate cache lookup (NFCI)
Rather than performing the cache lookup with both possible orders
for the locations, use the same canonicalization as the other
AliasCache lookups in BasicAA.
2020-10-24 10:19:02 +02:00
Nikita Popov
ac22411ea8 [BasicAA] Fix caching in the presence of phi cycles
Any time we insert a block into VisitedPhiBBs, previously cached
values may no longer be valid for the recursive alias queries. As
such, perform them using an empty AAQueryInfo.

Note that if we recurse to the same phi, the block will already
be inserted, so we reuse the old AAQueryInfo, and thus still
protect against infinite recursion.

This problem can appear with with an without BatchAA, but is more
likely to occur with BatchAA, as more values are cached.

Differential Revision: https://reviews.llvm.org/D90066
2020-10-24 09:58:02 +02:00
Jonas Paulsson
0d62b14425 [SystemZ] Define MaxInstLength to have the value of 6.
This value had the default value of 4 which caused branch relaxation to fail.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D90065
2020-10-24 09:19:34 +02:00
Tony
a12729cc2b [AMDGPU] Cleanup AMDGPUUsage.rst
- Layout and typo improvements.
- Add memory spaces section.
- reStructure syntax fixes.

Differential Revision: https://reviews.llvm.org/D90002
2020-10-24 06:21:27 +00:00
Med Ismail Bennani
b398ab2619 [llvm/DebugInfo] Emit DW_OP_implicit_value when tuning for LLDB
This patch enables emitting DWARF `DW_OP_implicit_value` opcode when
tuning debug information for LLDB (`-debugger-tune=lldb`).

This will also propagate to Darwin platforms, since they use LLDB tuning
as a default.

rdar://67406059

Differential Revision: https://reviews.llvm.org/D90001

Signed-off-by: Med Ismail Bennani <medismail.bennani@gmail.com>
2020-10-24 06:45:33 +02:00
Zequan Wu
243529579b [llvm-cov] don't include all source files when provided source files are filtered out
When all provided source files are filtered out either due to `--ignore-filename-regex` or not part of binary, don't generate coverage reults for all source files. Because if users want to generate coverage results for all source files, they don't even need to provid selected source files or `--ignore-filename-regex`.

Differential Revision: https://reviews.llvm.org/D89359
2020-10-23 19:32:16 -07:00
David Blaikie
174725a4ae llvm-dwarfdump: Support verbose printing DW_OP_convert to print the CU local offset before the resolved absolute offset 2020-10-23 18:50:15 -07:00
Hongtao Yu
243bd8f652 [AutoFDO] Remove a broken assert in merging inlinee samples
Duplicated callsites share the same callee profile if the original callsite was inlined. The sharing also causes the profile of callee's callee to be shared. This breaks the assert introduced ealier by D84997 in a tricky way.

To illustrate, I'm using an abstract example. Say we have three functions `A`, `B` and `C`. A calls B twice and B calls C once. Some optimize performed prior to the sample profile loader duplicates first callsite to `B` and the program may look like

```
A()
{
  B();  // with nested profile B1 and C1
  B();  // duplicated, with nested profile B1 and C1
  B();  // with nested profile B2 and C2
}
```

For some reason, the sample profile loader inliner then decides to only inline the first callsite in `A` and transforms `A` into

```
A()
{
  C();  // with nested profile C1
  B();  // duplicated, with nested profile B1 and C1
  B();  // with nested profile B2 and C2.
}
```

Here is what happens next:

	1. Failing to inline the callsite `C()` results in `C1`'s samples returned to `C`'s base (outlined) profile. In the meantime, `C1`'s head samples are updated to `C1`'s entry sample. This also affects the profile of the middle callsite which shares `C1` with the first callsite.
	2. Failing to inline the middle callsite results in `B1` returned to `B`'s base profile, which in turn will cause `C1` merged into `B`'s base profile. Note that the nest `C` profile in `B`'s base has a non-zero head sample count now. The value actually equals to `C1`'s entry count.
	3. Failing to inline last callsite results in `B2` returned to `B`'s base profile. Note that the nested `C` profile in `B`'s base now has an entry count equal to the sum of that of `C1` and `C2`, with the head count equal to that of `C1`. This will trigger the assert later on.
        4. Compiling `B` using `B`'s base profile. Failing to inline `C` there triggers the returning of the nested `C` profile. Since the nested `C` profile has a non-zero head count, the returning doesn't go through. Instead, the assert goes off.

It's good that `C1` is only returned once, based on using a non-zero head count to ensure an inline profile is only returned once. However C2 is never returned. While it seems hard to solve this perfectly within the current framework, I'm just removing the broken assert. This should be reasonably fixed by the upcoming CSSPGO work where counts returning is based on context-sensitivity and a distribution factor for callsite probes.

The simple example is extracted from one of our internal services. In reality, why the original callsite `B()` and duplicate one having different inline behavior is a magic. It has to do with imperfect counts in profile and extra complicated inlining that makes the hotness for them different.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D90056
2020-10-23 17:42:21 -07:00
Mehdi Amini
d01d6f8450 Remove unused verifyRegStateMapping() function in RegAllocFast (NFC)
This fixes compiler warning when building with assertions.
2020-10-24 00:36:51 +00:00
Krzysztof Parzyszek
28d3c031c0 [Hexagon] Handle selection between HVX vector predicates
Make sure that (select i1 q0 q1) is handled properly.
2020-10-23 18:22:03 -05:00
Arthur Eubanks
77ffdb3e3c [StructurizeCFG][NewPM] Port -structurizecfg to NPM
This doesn't support -structurizecfg-skip-uniform-regions since that
would require porting LegacyDivergenceAnalysis.

The NPM doesn't support adding a non-analysis pass as a dependency of
another, so I had to add -lowerswitch to some tests or pin them to the
legacy PM.

This is the only RegionPass in tree, so I simply copied the logic for
finding all Regions from the legacy PM's RGManager into
StructurizeCFG::run().

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D89026
2020-10-23 15:54:03 -07:00
Arthur Eubanks
2cbef9de5f [Inliner][NPM] Properly pass callee AAResults
Fixes noalias-calls.ll under NPM.

Differential Revision: https://reviews.llvm.org/D89592
2020-10-23 15:37:18 -07:00
Arthur Eubanks
40e31d2d13 [test] Simplify pr33641_remove_arg_dbgvalue.ll
This makes it pass under the NPM.
The legacy PM pass ran passes on SCCs in a different order, causing
argpromotion to not trigger on @bar().

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D89889
2020-10-23 15:36:05 -07:00
Keith Smiley
63e3d161d6 [llvm-install-name-tool] Add -prepend_rpath option
This diff adds the option -prepend_rpath which inserts an rpath as
the first rpath in the binary.

Test plan: make check-all

Differential revision: https://reviews.llvm.org/D89605
2020-10-23 15:01:03 -07:00
Evandro Menezes
809a4483be [RISCV] Use the commercial name for scheduling model (NFC)
Use the commercial name for the scheduling model for the SiFive 7 Series.
2020-10-23 16:33:27 -05:00
Cameron McInally
94734c2c54 [SVE] Lower fixed length VECREDUCE_SEQ_FADD operation
Differential Revision: https://reviews.llvm.org/D89162
2020-10-23 16:24:02 -05:00
Artur Pilipenko
31af2fa7ed GC-parseable element atomic memcpy/memmove
This change introduces a GC parseable lowering for element atomic
memcpy/memmove intrinsics. This way runtime can provide an
implementation which can take a safepoint during copy operation.

See "GC-parseable element atomic memcpy/memmove" thread on llvm-dev
for the background and details:
https://groups.google.com/g/llvm-dev/c/NnENHzmX-b8/m/3PyN8Y2pCAAJ

Differential Revision: https://reviews.llvm.org/D88861
2020-10-23 14:06:09 -07:00
Michael Liao
59287b0fc6 Fix shared build. NFC. 2020-10-23 15:53:05 -04:00
Florian Hahn
9f772c5b7d [AArch64] Add vector compare/select cost-model tests. 2020-10-23 20:43:04 +01:00
Geoffrey Martin-Noble
91a4ce9865 Unconditionally #include <future>
This unbreaks building with `LLVM_ENABLE_THREADS=0`. Since
https://github.com/llvm/llvm-project/commit/069919c9ba33 usage of
`std::promise` is not guarded by `LLVM_ENABLE_THREADS`, so this header
must be unconditionally included.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D89758
2020-10-23 19:17:37 +00:00
Arthur Eubanks
d924016b11 [gn build] Add missing comma 2020-10-23 12:01:23 -07:00
Nick Desaulniers
e95a065d26 [IR] add fn attr for no_stack_protector; prevent inlining on mismatch
It's currently ambiguous in IR whether the source language explicitly
did not want a stack a stack protector (in C, via function attribute
no_stack_protector) or doesn't care for any given function.

It's common for code that manipulates the stack via inline assembly or
that has to set up its own stack canary (such as the Linux kernel) would
like to avoid stack protectors in certain functions. In this case, we've
been bitten by numerous bugs where a callee with a stack protector is
inlined into an __attribute__((__no_stack_protector__)) caller, which
generally breaks the caller's assumptions about not having a stack
protector. LTO exacerbates the issue.

While developers can avoid this by putting all no_stack_protector
functions in one translation unit together and compiling those with
-fno-stack-protector, it's generally not very ergonomic or as
ergonomic as a function attribute, and still doesn't work for LTO. See also:
https://lore.kernel.org/linux-pm/20200915172658.1432732-1-rkir@google.com/
https://lore.kernel.org/lkml/20200918201436.2932360-30-samitolvanen@google.com/T/#u

Typically, when inlining a callee into a caller, the caller will be
upgraded in its level of stack protection (see adjustCallerSSPLevel()).
By adding an explicit attribute in the IR when the function attribute is
used in the source language, we can now identify such cases and prevent
inlining.  Block inlining when the callee and caller differ in the case that one
contains `nossp` when the other has `ssp`, `sspstrong`, or `sspreq`.

Fixes pr/47479.

Reviewed By: void

Differential Revision: https://reviews.llvm.org/D87956
2020-10-23 11:55:39 -07:00
Stanislav Mekhanoshin
270ab79b5a [AMDGPU] Fixed isLegalRegOperand() with physregs
This does not change anything at the moment, but needed for
D89170. In that change I am probing a physical SGPR to see if
it is legal. RC is SReg_32, but DRC for scratch instructions
is SReg_32_XEXEC_HI and test fails.

That is sufficient just to check if DRC contains a register
here in case of physreg. Physregs also do not use subregs
so the subreg handling below is irrelevant for these.

Differential Revision: https://reviews.llvm.org/D90064
2020-10-23 11:33:34 -07:00
Hubert Tong
528517aa74 [AIX][cmake] Adjust management of -G for linking
The change in 0ba98433971f changed the behaviour of the build when
using an XL build compiler because `-G` is not a pure linker option:
it also implies `-shared`. This was accounted for in the base CMake
configuration, so an analysis of the change from 0ba98433971f in
relation to a build using Clang (where `-shared` is introduced by CMake)
would not identify the issue. This patch resolves this particular issue
by adding `-shared` alongside `-Wl,-G`.

At the same time, the investigation reveals that several aspects of the
various build configurations are not operating in the manner originally
intended.

The other issue related to the `-G` linker option in the build is that
the removal of it (to avoid unnecessary use of run-time linking) is not
effective for the build using the Clang compiler. This patch addresses
this by adjusting the regular expressions used to remove the broadly-
applied `-G`.

Finally, the issue of specifying the export list with `-Wl,` instead of
a compiler option is flagged with a FIXME comment.

Reviewed By: daltenty, amyk

Differential Revision: https://reviews.llvm.org/D90041
2020-10-23 14:32:36 -04:00
Nikita Popov
88e55394ae [BasicAA] Add additional phi cycle test (NFC)
This is a variation of the BatchAA problem that also applies
without BatchAA. We may have a cached result from earlier in
the same query.
2020-10-23 20:31:20 +02:00
Mircea Trofin
fbc18d8c7b [NFC] Use [MC]Register in RegAllocGreedy
This was initiated from the uses of MCRegUnitIterator, so while likely
not exhaustive, it's a step forward.

Differential Revision: https://reviews.llvm.org/D89975
2020-10-23 11:30:53 -07:00
Baptiste Saleil
e248116f2c [PowerPC] Add intrinsics for MMA
This patch adds support for MMA intrinsics.

Authored by: Baptiste Saleil

Reviewed By: #powerpc, bsaleil, amyk

Differential Revision: https://reviews.llvm.org/D89345
2020-10-23 13:16:02 -05:00
Nikita Popov
bac894562a [PhiValues] Use SetVector to avoid non-determinism
I'm not sure whether this can cause actual non-determinism in the
compiler output, but at least it causes non-determinism in the
statistics collected by BasicAA.

Use SetVector to have a predictable iteration order.
2020-10-23 20:14:02 +02:00
Mircea Trofin
55c9f68e17 [MLInliner] Disable always inliner in bounds tests
That changes the threshold calculation.
2020-10-23 10:24:51 -07:00
Amara Emerson
a5ff88d2db [AArch64][GlobalISel] Introduce a new post-isel optimization pass.
There are two optimizations here:

1. Consider the following code:
 FCMPSrr %0, %1, implicit-def $nzcv
 %sel1:gpr32 = CSELWr %_, %_, 12, implicit $nzcv
 %sub:gpr32 = SUBSWrr %_, %_, implicit-def $nzcv
 FCMPSrr %0, %1, implicit-def $nzcv
 %sel2:gpr32 = CSELWr %_, %_, 12, implicit $nzcv
This kind of code where we have 2 FCMPs each feeding a CSEL can happen
when we have a single IR fcmp being used by two selects. During selection,
to ensure that there can be no clobbering of nzcv between the fcmp and the
csel, we have to generate an fcmp immediately before each csel is
selected.

However, often we can essentially CSE these together later in MachineCSE.
This doesn't work though if there are unrelated flag-setting instructions
in between the two FCMPs. In this case, the SUBS defines NZCV
but it doesn't have any users, being overwritten by the second FCMP.

Our solution here is to try to convert flag setting operations between
a interval of identical FCMPs, so that CSE will be able to eliminate one.

2. SelectionDAG imported patterns for arithmetic ops currently select the
flag-setting ops for CSE reasons, and add the implicit-def $nzcv operand
to those instructions. However if those impdef operands are not marked as
dead, the peephole optimizations are not able to optimize them into non-flag
setting variants. The optimization here is to find these dead imp-defs and
mark them as such.

This pass is only enabled when optimizations are enabled.

Differential Revision: https://reviews.llvm.org/D89415
2020-10-23 10:18:36 -07:00
LLVM GN Syncbot
df8c67668f [gn build] Port dbbc4f4e226 2020-10-23 17:06:41 +00:00
Arthur Eubanks
44db41329f Revert "[CGSCC] Detect devirtualization in more cases"
This reverts commit 3024fe5b55ed72633915f613bd5e2826583c396f.

Causes major compile time regressions:
https://llvm-compile-time-tracker.com/compare.php?from=3b8d8954bf2c192502d757019b9fe434864068e9&to=3024fe5b55ed72633915f613bd5e2826583c396f&stat=instructions
2020-10-23 09:53:52 -07:00
Alex Orlov
41781efce1 Added utility to launch tests on a target remotely.
Runs an executable on a remote host.
This is meant to be used as an executor when running the LLVM and the Libraries tests on a target.

Reviewed By: vvereschaka

Differential Revision: https://reviews.llvm.org/D89349
2020-10-23 20:52:30 +04:00
Lang Hames
a156cf61cb Re-apply "[JITLink][ELF] Add support for ELF::R_X86_64_REX_GOTPCRELX relocation"
This re-applies e2fceec2fd1 with fixes. Apparently we already *do* support
relaxation for ELF, so we need to make sure the test case allocates a slab at
a fixed address, and that the R_X86_64_REX_GOTPCRELX test references an external
that is guaranteed to be out of range.
2020-10-23 09:48:05 -07:00
Huihui Zhang
f5744161af [AArch64][SVE] Fix umin/umax lowering to handle out of range imm.
Immediate must be in an integer range [0,255] for umin/umax instruction.
Extend pattern matching helper SelectSVEArithImm() to take in value type
bitwidth when checking immediate value is in range or not.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D89831
2020-10-23 09:42:56 -07:00