1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-01-31 20:51:52 +01:00

208409 Commits

Author SHA1 Message Date
Lucas Prates
4d5426f96a [ARM][AAarch64] Initial command-line support for v8.7-A
This introduces command-line support for the 'armv8.7-a' architecture name
(and an alias without the '-', as usual), and for the 'ls64' extension name.

Based on patches written by Simon Tatham.

Reviewed By: ostannard

Differential Revision: https://reviews.llvm.org/D91776
2020-12-17 13:47:28 +00:00
Lucas Prates
06f39003e9 [AArch64] Adding the v8.7-A LD64B/ST64B Accelerator extension
This adds support for the v8.7-A LD64B/ST64B Accelerator extension
through a subtarget feature called "ls64". It adds four 64-byte
load/store instructions with an operand in the new GPR64x8 register
class, and one system register that's part of the same extension.

Based on patches written by Simon Tatham.

Reviewed By: ostannard

Differential Revision: https://reviews.llvm.org/D91775
2020-12-17 13:46:23 +00:00
Lucas Prates
761abd9c8a [AArch64] Add a GPR64x8 register class
This adds a GPR64x8 register class that will be needed as the data
operand to the LD64B/ST64B family of instructions in the v8.7-A
Accelerator Extension, which load or store a contiguous range of eight
x-regs. It has to be its own register class so that register allocation
will have visibility of the full set of registers actually read/written
by the instructions, which will be needed when we add intrinsics and/or
inline asm access to this piece of architecture.

Patch written by Simon Tatham.

Reviewed By: ostannard

Differential Revision: https://reviews.llvm.org/D91774
2020-12-17 13:45:46 +00:00
Lucas Prates
870ec0cc7f [ARM][AArch64] Adding basic support for the v8.7-A architecture
This introduces support for the v8.7-A architecture through a new
subtarget feature called "v8.7a". It adds two new "WFET" and "WFIT"
instructions, the nXS limited-TLB-maintenance qualifier for DSB and TLBI
instructions, a new CPU id register, ID_AA64ISAR2_EL1, and the new
HCRX_EL2 system register.

Based on patches written by Simon Tatham and Victor Campos.

Reviewed By: ostannard

Differential Revision: https://reviews.llvm.org/D91772
2020-12-17 13:45:08 +00:00
Lucas Prates
c036de1337 [NFC][AArch64] Capturing multiple feature requirements in AsmParser messages
This enables the capturing of multiple required features in the AArch64
AsmParser's SysAlias error messages.

Reviewed By: ostannard

Differential Revision: https://reviews.llvm.org/D92388
2020-12-17 13:44:17 +00:00
Lucas Prates
2f00dfdbd4 [NFC][AArch64] Move AArch64 MSR/MRS into a new decoder namespace
This removes the general forms of the AArch64 MSR and MRS instructions
from the same decoding table that contains many more specific
instructions that supersede them. They're now in a separate decoding
table of their own, called "Fallback", which is only consulted in the
event of the main decoder table failing to produce an answer.

This should avoid decoding conflicts on future specialized instructions
in the MSR space.

Patch written by Simon Tatham.

Reviewed By: ostannard

Differential Revision: https://reviews.llvm.org/D91771
2020-12-17 13:40:10 +00:00
Florian Hahn
1341177ee3 [IRBuilder] Generalize debug loc handling for arbitrary metadata.
This patch extends IRBuilder to allow adding/preserving arbitrary
metadata on created instructions.

Instead of using references to specific metadata nodes (like DebugLoc),
IRbuilder now keeps a vector of (metadata kind, MDNode *) pairs, which
are added to each created instruction.

The patch itself is a NFC and only moves the existing debug location
handling over to the new system. In a follow-up patch it will be used to
preserve !annotation metadata besides !dbg.

The current approach requires iterating over MetadataToCopy to avoid
adding duplicates, but given that the number of metadata kinds to
copy/preserve is going to be very small initially (0, 1 (for !dbg) or 2
(!dbg and !annotation)) that should not matter.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D93400
2020-12-17 13:27:43 +00:00
Cullen Rhodes
2b9814f6d5 [LV] Disable epilogue vectorization for scalable VFs
Epilogue vectorization doesn't support scalable vectorization factors
yet, disable it for now.

Reviewed By: sdesmalen, bmahjour

Differential Revision: https://reviews.llvm.org/D93063
2020-12-17 12:14:03 +00:00
Simon Pilgrim
058127650f [DebugInfo] Fix MSVC build by adding back necessary reverse_iterator != operator
Put back the std::reverse_iterator<DWARFDie::iterator> != operator that was removed in D78938 to fix VS2019 builds
2020-12-17 12:06:44 +00:00
Kerry McLaughlin
c047e26e14 [AArch64] Renamed sve-masked-scatter-legalise.ll. NFC. 2020-12-17 11:40:09 +00:00
Kerry McLaughlin
b0995cb44f [SVE][CodeGen] Add bfloat16 support to scalable masked gather
Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D93307
2020-12-17 11:08:15 +00:00
dfukalov
b7b67e3e9a [NFC] Reduce include files dependency and AA header cleanup (part 2).
Continuing work started in https://reviews.llvm.org/D92489:

Removed a bunch of includes from "AliasAnalysis.h" and "LoopPassManager.h".

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D92852
2020-12-17 14:04:48 +03:00
Simon Pilgrim
fdd8c22c46 [X86] Remove extract_subvector(subv_broadcast_load()) fold.
This was needed in an earlier version of D92645, but isn't now - and I've just noticed that it was potentially flawed depending on the relevant widths of the broadcasted and extracted subvectors.
2020-12-17 11:02:49 +00:00
Krasimir Georgiev
4e1f84e7cc fix a -Wunused-variable warning in release build 2020-12-17 11:52:00 +01:00
Barry Revzin
2fc9f32ca3 Make LLVM build in C++20 mode
Part of the <=> changes in C++20 make certain patterns of writing equality
operators ambiguous with themselves (sorry!).
This patch goes through and adjusts all the comparison operators such that
they should work in both C++17 and C++20 modes. It also makes two other small
C++20-specific changes (adding a constructor to a type that cases to be an
aggregate, and adding casts from u8 literals which no longer have type
const char*).

There were four categories of errors that this review fixes.
Here are canonical examples of them, ordered from most to least common:

// 1) Missing const
namespace missing_const {
    struct A {
    #ifndef FIXED
        bool operator==(A const&);
    #else
        bool operator==(A const&) const;
    #endif
    };

    bool a = A{} == A{}; // error
}

// 2) Type mismatch on CRTP
namespace crtp_mismatch {
    template <typename Derived>
    struct Base {
    #ifndef FIXED
        bool operator==(Derived const&) const;
    #else
        // in one case changed to taking Base const&
        friend bool operator==(Derived const&, Derived const&);
    #endif
    };

    struct D : Base<D> { };

    bool b = D{} == D{}; // error
}

// 3) iterator/const_iterator with only mixed comparison
namespace iter_const_iter {
    template <bool Const>
    struct iterator {
        using const_iterator = iterator<true>;

        iterator();

        template <bool B, std::enable_if_t<(Const && !B), int> = 0>
        iterator(iterator<B> const&);

    #ifndef FIXED
        bool operator==(const_iterator const&) const;
    #else
        friend bool operator==(iterator const&, iterator const&);
    #endif
    };

    bool c = iterator<false>{} == iterator<false>{} // error
          || iterator<false>{} == iterator<true>{}
          || iterator<true>{} == iterator<false>{}
          || iterator<true>{} == iterator<true>{};
}

// 4) Same-type comparison but only have mixed-type operator
namespace ambiguous_choice {
    enum Color { Red };

    struct C {
        C();
        C(Color);
        operator Color() const;
        bool operator==(Color) const;
        friend bool operator==(C, C);
    };

    bool c = C{} == C{}; // error
    bool d = C{} == Red;
}

Differential revision: https://reviews.llvm.org/D78938
2020-12-17 10:44:10 +00:00
Simon Pilgrim
ac71050514 [X86] Add X86ISD::SUBV_BROADCAST_LOAD and begin removing X86ISD::SUBV_BROADCAST (PR38969)
Subvector broadcasts are only load instructions, yet X86ISD::SUBV_BROADCAST treats them more generally, requiring a lot of fallback tablegen patterns.

This initial patch replaces constant vector lowering inside lowerBuildVectorAsBroadcast with direct X86ISD::SUBV_BROADCAST_LOAD loads which helps us merge a number of equivalent loads/broadcasts.

As well as general plumbing/analysis additions for SUBV_BROADCAST_LOAD, I needed to wrap SelectionDAG::makeEquivalentMemoryOrdering so it can handle result chains from non generic LoadSDNode nodes.

Later patches will continue to replace X86ISD::SUBV_BROADCAST usage.

Differential Revision: https://reviews.llvm.org/D92645
2020-12-17 10:25:25 +00:00
David Spickett
c4d89b8db0 [llvm][AArch64] Actually check expected FPU for CPUs
We were passing this as an argument but never using
it. ARM has always checked this.

Note that the FPU list is shared between ARM and AArch64
so there is no AArch64::getFPUName, just ARM::getFPUName.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D93387
2020-12-17 09:15:51 +00:00
Florian Hahn
6590180dcc [InstCombine] Preserve !annotation for newly created instructions.
When replacing an instruction with !annotation with a newly created
replacement, add the !annotation metadata to the replacement.

This mostly covers cases where the new instructions are created using
the ::Create helpers. Instructions created by IRBuilder will be handled
by D91444.

Reviewed By: thegameg

Differential Revision: https://reviews.llvm.org/D93399
2020-12-17 09:06:51 +00:00
QingShan Zhang
c9256d3620 Expand the fp_to_int/int_to_fp/fp_round/fp_extend as libcall for fp128
X86 and AArch64 expand it as libcall inside the target. And PowerPC also
want to expand them as libcall for P8. So, propose an implement in the
legalizer to common the logic and remove the code for X86/AArch64 to
avoid the duplicate code.

Reviewed By: Craig Topper

Differential Revision: https://reviews.llvm.org/D91331
2020-12-17 07:59:30 +00:00
Fangrui Song
8b5501e7ef Use basic_string::find(char) instead of basic_string::find(const char *s, size_type pos=0)
Many (StringRef) cannot be detected by clang-tidy performance-faster-string-find.
2020-12-16 23:28:32 -08:00
Wei Mi
74da8abf7f [NFC][SampleFDO] Preparation to support multiple sections with the same type
in ExtBinary format.

Currently ExtBinary format doesn't support multiple sections with the same type
in the profile. We add the support in this patch. Previously we use the section
type to identify a section uniquely. Now we introduces a LayoutIndex in the
SecHdrTableEntry and use the LayoutIndex to locate the target section. The
allocations of NameTable and FuncOffsetTable are adjusted accordingly.

Currently it works as a NFC because it won't change anything for current layout.
The test for multiple sections support will be included in another patch where a
new type of profile containing multiple sections with the same type is
introduced.

Differential Revision: https://reviews.llvm.org/D93254
2020-12-16 22:28:45 -08:00
Xiang1 Zhang
7ce08bafea [Debugify] Support checking Machine IR debug info
Add mir-check-debug pass to check MIR-level debug info.

For IR-level, currently, LLVM have debugify + check-debugify to generate
and check debug IR. Much like the IR-level pass debugify, mir-debugify
inserts sequentially increasing line locations to each MachineInstr in a
Module, But there is no equivalent MIR-level check-debugify pass, So now
we support it at "mir-check-debug".

Reviewed By: djtodoro

Differential Revision: https://reviews.llvm.org/D91595
2020-12-16 22:17:25 -08:00
Kazu Hirata
f195aa7b4b [GCN] Remove unused function handleNewInstruction (NFC)
The function was added without a user on Dec 22, 2016 in commit
7e274e02ae088923e67cd13b99d52644532ad1cc.  It seems to be unused since
then.
2020-12-16 21:57:48 -08:00
Kazu Hirata
c90828e99e [IR, CodeGen] Use llvm::is_contained (NFC) 2020-12-16 21:30:44 -08:00
Mircea Trofin
f7ea2a809e [NFC] factor update test function test builder as a class
This allows us to have shared logic over multiple test runs, e.g. do we
have unused prefixes, or which function bodies have conflicting outputs
for a prefix appearing in different RUN lines.

This patch is just wrapping existing functionality, and replacing its uses.
A subsequent patch would then fold the current functionality into the newly
introduced class.

Differential Revision: https://reviews.llvm.org/D93413
2020-12-16 21:12:06 -08:00
Craig Topper
e11bc7a6b9 [RISCV] Infer mask type from data type for vector vle and vse intrinsics.
The mask type should have the same number of elements as the data
type.

Similar to D93409 which did this for arithmetic intrinsics.
2020-12-16 20:56:14 -08:00
Craig Topper
c3e85e1cd2 [RISCV] Infer mask type for vector intrinsics from the data type
We can use LLVMScalarOrSameVectorWidth<0, llvm_i1_ty> to infer the mask type from the anyvector_ty. This will save us from needing to pass it to getDeclaration when creating these intrinsics from clang.

No tests updates are needed because our declarations are exploiting a behavior in the IR parser where the declaration of an intrinsic doesn't need to mention all the types as long as there isn't a name conflict in the file.

Reviewed By: khchen

Differential Revision: https://reviews.llvm.org/D93409
2020-12-16 20:18:30 -08:00
Xiang1 Zhang
085e55a045 Revert "[Debugify] Support checking Machine IR debug info"
This reverts commit 50aaa8c274910d78d7bf6c929a34fe58b1f45579.
2020-12-16 20:12:33 -08:00
Hsiangkai Wang
2ca38c8914 [RISCV] Define vector widening mul intrinsics.
Define vector widening mul intrinsics and lower them to V instructions.

We work with @rogfer01 from BSC to come out this patch.

Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Hsiangkai Wang <kai.wang@sifive.com>

Differential Revision: https://reviews.llvm.org/D93381
2020-12-17 11:50:33 +08:00
Hsiangkai Wang
c96dd5329b [RISCV] Define vector mul/div/rem intrinsics.
Define vector mul/div/rem intrinsics and lower them to V instructions.

We work with @rogfer01 from BSC to come out this patch.

Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Hsiangkai Wang <kai.wang@sifive.com>

Differential Revision: https://reviews.llvm.org/D93380
2020-12-17 11:50:17 +08:00
Hsiangkai Wang
e9ee776670 [RISCV] V does not imply F.
If users want to use vector floating point instructions, they need to
specify 'F' extension additionally.

Differential Revision: https://reviews.llvm.org/D93282
2020-12-17 10:57:36 +08:00
Matt Arsenault
1a1389661b AMDGPU: Remove SGPRSpillVGPRDefinedSet hack
These VGPRs should be reserved and therefore do not need "correct"
liveness. They should not have undef uses, which can still cause
issues.
2020-12-16 21:33:35 -05:00
Zakk Chen
d834a5b730 [RISCV] Define vle/vse intrinsics.
Define vle/vse intrinsics and lower to V instructions.

We work with @rogfer01 from BSC to come out this patch.

Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Zakk Chen <zakk.chen@sifive.com>

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D93359
2020-12-16 18:08:15 -08:00
Xiang1 Zhang
d9173ede20 [Debugify] Support checking Machine IR debug info
Add mir-check-debug pass to check MIR-level debug info.

For IR-level, currently, LLVM have debugify + check-debugify to generate
and check debug IR. Much like the IR-level pass debugify, mir-debugify
inserts sequentially increasing line locations to each MachineInstr in a
Module, But there is no equivalent MIR-level check-debugify pass, So now
we support it at "mir-check-debug".

Reviewed By: djtodoro

Differential Revision: https://reviews.llvm.org/D91595
2020-12-16 18:04:05 -08:00
Johannes Doerfert
62ee447a70 [OpenMP] Add initial support for omp [begin/end] assumes
The `assumes` directive is an OpenMP 5.1 feature that allows the user to
provide assumptions to the optimizer. Assumptions can refer to
directives (`absent` and `contains` clauses), expressions (`holds`
clause), or generic properties (`no_openmp_routines`, `ext_ABCD`, ...).

The `assumes` spelling is used for assumptions in the global scope while
`assume` is used for executable contexts with an associated structured
block.

This patch only implements the global spellings. While clauses with
arguments are "accepted" by the parser, they will simply be ignored for
now. The implementation lowers the assumptions directly to the
`AssumptionAttr`.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D91980
2020-12-16 20:02:49 -06:00
Nico Weber
fb6eef330d [gn build] (manually) port ddffcdf0a66 2020-12-16 20:49:56 -05:00
Arthur Eubanks
e9a42c1b56 [test] Cleanup some CGSCCPassManager tests
Don't iterate over SCC as we potentially modify it.
Verify module (and fix some broken ones).
Only run pass once and make sure that it's actually run.
Rename tests to just end in a number since I'm planning on adding a
bunch more which won't have good individual names. Instead, add comments
on the transformations that each test does.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D93427
2020-12-16 16:25:55 -08:00
LLVM GN Syncbot
2be8aad619 [gn build] Port ac068e014b2 2020-12-17 00:07:28 +00:00
Hongtao Yu
66121fdf6a [CSSPGO] Consume pseudo-probe-based AutoFDO profile
This change enables pseudo-probe-based sample counts to be consumed by the sample profile loader under the regular `-fprofile-sample-use` switch with minimal adjustments to the existing sample file formats. After the counts are imported, a probe helper, aka, a `PseudoProbeManager` object, is automatically launched to verify the CFG checksum of every function in the current compilation against the corresponding checksum from the profile. Mismatched checksums will cause a function profile to be slipped. A `SampleProfileProber` pass is scheduled before any of the `SampleProfileLoader` instances so that the CFG checksums as well as probe mappings are available during the profile loading time. The `PseudoProbeManager` object is set up right after the profile reading is done. In the future a CFG-based fuzzy matching could be done in `PseudoProbeManager`.

Samples will be applied only to pseudo probe instructions as well as probed callsites once the checksum verification goes through. Those instructions are processed in the same way that regular instructions would be processed in the line-number-based scenario. In other words, a function is processed in a regular way as if it was reduced to just containing pseudo probes (block probes and callsites).

**Adjustment to profile format **

A CFG checksum field is being added to the existing AutoFDO profile formats. So far only the text format and the extended binary format are supported. For the text format, a new line like
```
!CFGChecksum: 12345
```
is added to the end of the body sample lines. For the extended binary profile format, we introduce a metadata section to store the checksum map from function names to their CFG checksums.

Differential Revision: https://reviews.llvm.org/D92347
2020-12-16 15:57:18 -08:00
Guozhi Wei
5a54185a35 [MBP] Add whole chain to BlockFilterSet instead of individual BB
Currently we add individual BB to BlockFilterSet if its frequency satisfies

  LoopFreq / Freq <= LoopToColdBlockRatio

LoopFreq is edge frequency from outside to loop header.
LoopToColdBlockRatio is a command line parameter.

It doesn't make sense since we always layout whole chain, not individual BBs.

It may also cause a tricky problem. Sometimes it is possible that the LoopFreq
of an inner loop is smaller than LoopFreq of outer loop. So a BB can be in
BlockFilterSet of inner loop, but not in BlockFilterSet of outer loop,
like .cold in the test case. So it is added to the chain of inner loop. When
work on the outer loop, .cold is not added to BlockFilterSet, so the edge to
successor .problem is not counted in UnscheduledPredecessors of .problem chain.
But other blocks in the inner loop are added BlockFilterSet, so the whole inner
loop chain can be layout, and markChainSuccessors is called to decrease
UnscheduledPredecessors of following chains. markChainSuccessors calls
markBlockSuccessors for every BB, even it is not in BlockFilterSet, like .cold,
so .problem chain's UnscheduledPredecessors is decreased, but this edge was not
counted on in fillWorkLists, so .problem chain's UnscheduledPredecessors
becomes 0 when it still has an unscheduled predecessor .pred! And it causes
problems in following various successor BB selection algorithms.

Differential Revision: https://reviews.llvm.org/D89088
2020-12-16 15:45:47 -08:00
alex-t
0182a0a11f Disable Jump Threading for the targets with divergent control flow
Details: Jump Threading does not make sense for the targets with divergent CF
         since they do not use branch prediction for speculative execution.
         Also in the high level IR there is no enough information to conclude that the branch is divergent or uniform.
         This may cause errors in further CF lowering.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D93302
2020-12-17 02:40:54 +03:00
Gulfem Savrun Yeniceri
d7b7f50885 [IR] Fixed the typo in attributes test
Fixed the typo introduced in D90275.

Differential Revision: https://reviews.llvm.org/D93420
2020-12-16 15:06:23 -08:00
Harald van Dijk
c5fe190946 [X86] Avoid %fs:(%eax) references in x32 mode
The ABI explains that %fs:(%eax) zero-extends %eax to 64 bits, and adds
that the TLS base address, but that the TLS base address need not be
at the start of the TLS block, TLS references may use negative offsets.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D93158
2020-12-16 22:39:57 +00:00
Roman Lebedev
51e36a66fe [SimplifyCFG] Teach mergeEmptyReturnBlocks() to preserve DomTree
A first real transformation that didn't already knew how to do that,
but it's pretty tame - either change successor of all the predecessors
of a block and carefully delay deletion of the block until afterwards
the DomTree updates are appled, or add a successor to the block.

There wasn't a great test coverage for this, so i added extra, to be sure.
2020-12-17 01:03:50 +03:00
Roman Lebedev
785059d651 [SimplifyCFG] TryToSimplifyUncondBranchFromEmptyBlock() already knows how to preserve DomTree
... so just ensure that we pass DomTreeUpdater it into it.

Fixes DomTree preservation for a large number of tests,
all of which are marked as such so that they do not regress.
2020-12-17 01:03:49 +03:00
Roman Lebedev
eda37b998a [SimplifyCFG] MergeBlockIntoPredecessor() already knows how to preserve DomTree
... so just ensure that we pass DomTreeUpdater it into it.

Fixes DomTree preservation for a large number of tests,
all of which are marked as such so that they do not regress.
2020-12-17 01:03:49 +03:00
Roman Lebedev
3049d557f3 [SimplifyCFG] removeUnreachableBlocks() already knows how to preserve DomTree
... so just ensure that we pass DomTreeUpdater it into it.

Apparently, there were no dedicated tests just for that functionality,
so i'm adding one here.
2020-12-17 01:03:49 +03:00
Roman Lebedev
495950222a [NFCI][SimplifyCFG] Mark all the SimplifyCFG tests that already don't invalidate DomTree as such
First step after e1133179587dd895962a2fe4d6eb0cb1e63b5ee2,
in these tests, DomTree is valid afterwards, so mark them as such,
so that they don't regress.

In further steps, SimplifyCFG transforms shall taught to preserve DomTree,
in as small steps as possible.
2020-12-17 01:03:49 +03:00
Fangrui Song
e27537bec4 [AArch64InstPrinter] Use * 4096 instead of << 12
Left shirting a negative integer is undefined before C++20.
2020-12-16 14:02:25 -08:00
Rong Xu
78d4fc2170 [PGO] Use the sum of profile counts to fix the function entry count
Raw profile count values for each BB are not kept after profile
annotation. We record function entry count and branch weights
and use them to compute the count when needed.  This mechanism
works well in a perfect world, but often breaks in real programs,
because of number prevision, inconsistent profile, or bugs in
BFI). This patch uses sum of profile count values to fix
function entry count to make the BFI count close to real profile
counts.

Differential Revision: https://reviews.llvm.org/D61540
2020-12-16 13:37:43 -08:00