1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-26 12:43:36 +01:00
Commit Graph

212683 Commits

Author SHA1 Message Date
Alexander Yermolovich
0440613b8e [DWARF] Check for AddrOffsetSectionBase to work with DWO Units.
Context: https://lists.llvm.org/pipermail/llvm-dev/2021-February/148521.html

A fix for llvm-symbolizer, and other tools like BOLT, that allows retrieving address when built with -gsplit-dwarf=single mode.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D96827
2021-03-15 14:46:09 -07:00
Artem Belevich
7a17da6eb6 [NVPTX] Avoid temp copy of byval kernel parameters.
Avoid making a temporary copy of byval argument if all accesses are loads and
therefore the pointer to the parameter can not escape.

This avoids excessive global memory accesses when each kernel makes its own
copy.

Differential revision: https://reviews.llvm.org/D98469
2021-03-15 14:27:22 -07:00
Nick Lewycky
40de5cf96a NFC: Formatting changes.
Run clang-format over these files.

Capitalize some variable names per clang-tidy's request.

Pulled out to simplify review of D98302.
2021-03-15 14:26:39 -07:00
Stanislav Mekhanoshin
17050632cf [AMDGPU] Fix copyPhysReg to not produce unalined vgpr access
RA can insert something like a sub1_sub2 COPY of a wide VGPR
tuple which results in the unaligned acces with v_pk_mov_b32
after the copy is expanded. This is regression after D97316.

Differential Revision: https://reviews.llvm.org/D98549
2021-03-15 14:14:30 -07:00
Florian Hahn
1aed9ced3b [AnnotationRemarks] Remove unneeded Function.h include (NFC). 2021-03-15 21:09:35 +00:00
Nico Weber
19f76964c4 [gn build] merge 9bcf0eff99 2021-03-15 17:05:05 -04:00
Nico Weber
f75a03ef22 [gn build] kind of merge af2796c76d2f
Good enough for now. If we need more, we'll do the usual
platform-dependent hardcoding that in practice works for everything else
too.
2021-03-15 17:01:00 -04:00
Stanislav Mekhanoshin
fc6febe595 [AMDGPU] Fixed msan failure with uninitialized value 2021-03-15 13:58:19 -07:00
Kirill Bobyrev
da5206f100 [clangd] Optionally add reflection for clangd-index-server
This was originally landed without the optional part and reverted later:

8080ea4c4b

Reviewed By: kadircet

Differential Revision: https://reviews.llvm.org/D98404
2021-03-15 21:07:25 +01:00
Markus Böck
9badf35f5b Revert line accidentally included in af2796c76d2ff4b73165ed47959afd35a769beee 2021-03-15 21:03:46 +01:00
Sanjay Patel
677d887642 [SLP] update stale test comments; NFC
These bugs were fixed with 0a8e7ca402eb
2021-03-15 16:02:46 -04:00
Stanislav Mekhanoshin
196e7f3138 [AMDGPU] Use single cache policy operand
Replace individual operands GLC, SLC, and DLC with a single cache_policy
bitmask operand. This will reduce the number of operands in MIR and I hope
the amount of code. These operands are mostly 0 anyway.

Additional advantage that parser will accept these flags in any order unlike
now.

Differential Revision: https://reviews.llvm.org/D96469
2021-03-15 13:00:59 -07:00
Markus Böck
1be4884f17 [test] Add ability to get error messages from CMake for errc substitution
Visual Studios implementation of the C++ Standard Library does not use strerror to produce a message for std::error_code unlike other standard libraries such as libstdc++ or libc++ that might be used.

This patch adds a cmake script that through running a C++ program gets the error messages for the POSIX error codes and passes them onto lit through an optional config parameter.

If the config parameter is not set, or getting the messages failed, due to say a cross compiling configuration without an emulator, it will fall back to using pythons strerror functions.

Differential Revision: https://reviews.llvm.org/D98278
2021-03-15 20:56:08 +01:00
Wenlei He
93a5ec7e97 [CSSPGO] Load context profile for external functions in PreLink and populate ThinLTO import list
For ThinLTO's prelink compilation, we need to put external inline candidates into an import list attached to function's entry count metadata. This enables ThinLink to treat such cross module callee as hot in summary index, and later helps postlink to import them for profile guided cross module inlining.

For AutoFDO, the import list is retrieved by traversing the nested inlinee functions. For CSSPGO, since profile is flatterned, a few things need to happen for it to work:

 - When loading input profile in extended binary format, we need to load all child context profile whose parent is in current module, so context trie for current module includes potential cross module inlinee.
 - In order to make the above happen, we need to know whether input profile is CSSPGO profile before start reading function profile, hence a flag for profile summary section is added.
 - When searching for cross module inline candidate, we need to walk through the context trie instead of nested inlinee profile (callsite sample of AutoFDO profile).
 - Now that we have more accurate counts with CSSPGO, we swtiched to use entry count instead of total count to decided if an external callee is potentially beneficial to inline. This make it consistent with how we determine whether call tagert is potential inline candidate.

Differential Revision: https://reviews.llvm.org/D98590
2021-03-15 12:22:15 -07:00
Fangrui Song
0fdd473f7e Change void getNoop(MCInst &NopInst) to MCInst getNop()
Prefer (self-documenting) return values to output parameters (which are
liable to be used).
While here, rename Noop to Nop which is more widely used and improves
consistency with hasEmitNops/setEmitNops/emitNop/etc.
2021-03-15 12:05:34 -07:00
Craig Topper
bea426ec19 [RISCV] Add RISCVISD::BR_CC similar to RISCVISD::SELECT_CC.
This allows me to introduce similar combines for branches as
we have recently added for SELECT_CC. Some of them are less
useful for standalone setccs and only help branch instructions.
By having a BR_CC node its easier to only affect branches.

I'm using CondCodeSDNode to make isel patterns easier to
write so we can refer to the codes by name. SELECT_CC uses a
constant instead.

I've translated the condition code just like SELECT_CC so
we need less patterns for the swapped conditions. This
includes special cases for X < 1 and X > -1 that get translated
to blez and bgez by using a 0 constant.

computeKnownBitsForTargetNode support for SELECT_CC is added
to allow MaskedValueIsZero to work for cases where the true
and false values of the SELECT_CC are setccs and the
result of the SELECT_CC is used by a BR_CC. This was needed
to avoid regressions in some of the overflow tests.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D98159
2021-03-15 11:54:01 -07:00
Philipp Tomsich
e1e0468f14 [RISCV] Add isel-patterns to optimize (a < 1) into blez (a <= 0)
The following code-sequence showed up in a testcase (isolated from
SPEC2017) for if-conversion and vectorization when searching for the
maximum in an array:
        addi    a2, zero, 1
        blt     a1, a2, .LBB0_5
which can be expressed as `bge zero,a1,.LBB0_5`/`blez a1,/LBB0_5`.

More generally, we want to express (a < 1) as (a <= 0).

This adds the required isel-pattern and updates the testcases.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D98449
2021-03-15 11:32:43 -07:00
Stelios Ioannou
5ca2e09258 [AArch64] Implement __rndr, __rndrrs intrinsics
This patch implements the __rndr and __rndrrs intrinsics to provide access to the random
number instructions introduced in Armv8.5-A. They are only defined for the AArch64
execution state and are available when __ARM_FEATURE_RNG is defined.

These intrinsics store the random number in their pointer argument and return a status
code if the generation succeeded. The difference between __rndr __rndrrs, is that the latter
intrinsic reseeds the random number generator.

The instructions write the NZCV flags indicating the success of the operation that we can
then read with a CSET.

[1] https://developer.arm.com/docs/101028/latest/data-processing-intrinsics
[2] https://bugs.llvm.org/show_bug.cgi?id=47838

Differential Revision: https://reviews.llvm.org/D98264

Change-Id: I8f92e7bf5b450e5da3e59943b53482edf0df6efc
2021-03-15 17:51:48 +00:00
Juneyoung Lee
5173305fd6 [AssumeBundles] Add nonnull/align to op bundle if noundef exists
This is a patch to add nonnull and align to assume's operand bundle
only if noundef exists.
Since nonnull and align in fn attr have poison semantics, they should be
paired with noundef or noundef-implying attributes to be immediate UB.

Reviewed By: jdoerfert, Tyker

Differential Revision: https://reviews.llvm.org/D98228
2021-03-16 10:23:42 +09:00
Fraser Cormack
33c1eeca58 [CodeGen] Fix issues with scalable-vector INSERT/EXTRACT_SUBVECTORs
This patch addresses a few issues when dealing with scalable-vector
INSERT_SUBVECTOR and EXTRACT_SUBVECTOR nodes.

When legalizing in DAGTypeLegalizer::SplitVecRes_INSERT_SUBVECTOR, we
store the low and high halves to the stack separately. The offset for
the high half was calculated incorrectly.

Additionally, we can optimize this process when we can detect that the
subvector is contained entirely within the low/high split vector type.
While this optimization is valid on scalable vectors, when performing
the 'high' optimization, the subvector must also be a scalable vector.
Note that the 'low' optimization is still conservative: it may be
possible to insert v2i32 into the low half of a split nxv1i32/nxv1i32,
but we can't guarantee it. It is always possible to insert v2i32 into
nxv2i32 or v2i32 into nxv4i32+2 as we know vscale is at least 1.

Lastly, in SelectionDAG::isSplatValue, we early-exit on the extracted subvector value
type being a scalable vector, forgetting that we can also extract a
fixed-length vector from a scalable one.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D98495
2021-03-15 17:04:21 +00:00
Nico Weber
4f01a02f9f [gn build] (semi-manually) port b136a74efc54 2021-03-15 12:51:12 -04:00
Christopher Tetreault
909d35077f [CMake] Require python 3.6 if enabling LLVM test targets
The lit test suite uses python 3.6 features. Rather than a strange
python syntax error upon running the lit tests, we will require the
correct version in CMake.

Reviewed By: serge-sans-paille, yln

Differential Revision: https://reviews.llvm.org/D95635
2021-03-15 09:50:39 -07:00
Craig Topper
b807b27145 [RISCV] Improve legalization of i32 UADDO/USUBO on RV64.
The default legalization uses zero extends that require pair of shifts
on RISCV. Instead we can take advantage of the fact that unsigned
compares work equally well on sign extended inputs. This allows
us to use addw/subw and sext.w.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D98233
2021-03-15 09:30:23 -07:00
Simon Pilgrim
ae2fffa9ef [X86][SSE] isHorizontalBinOp - ensure we clear any unused source operands to improve HADD/SUB matching
Our shuffle matching for HADD/SUB patterns wasn't clearing repeated ops in 'fake unary' style shuffle masks (unpack(x,x) etc.), preventing matching of add(fakeunary(),fakeunary()) style patterns.
2021-03-15 16:24:29 +00:00
Zahira Ammarguellat
75a089166d [NFC] Fix "unused parameter" error revealed in the Linux self-build. 2021-03-15 12:17:11 -04:00
Sanjay Patel
69f00a58ef [InstSimplify] ctlz({signbit} >>u x) --> x
The motivating pattern was handled in 0a2d69480d ,
but we should have this for symmetry.

But this really highlights that we could generalize for
any shifted constant if we match this in instcombine.

https://alive2.llvm.org/ce/z/MrmVNt
2021-03-15 12:03:35 -04:00
Sanjay Patel
6b99ab8b48 [InstSimplify] add tests for ctlz of shifted constant; NFC 2021-03-15 12:03:35 -04:00
LLVM GN Syncbot
54ae663167 [gn build] Port 13e49dcee48f 2021-03-15 15:24:41 +00:00
Jon Chesterfield
c86d684da8 [amdgpu] Implement lower function LDS pass
[amdgpu] Implement lower function LDS pass

Local variables are allocated at kernel launch. This pass collects global
variables that are used from non-kernel functions, moves them into a new struct
type, and allocates an instance of that type in every kernel. Uses are then
replaced with a constantexpr offset.

Prior to this pass, accesses from a function are compiled to trap. With this
pass, most such accesses are removed before reaching codegen. The trap logic
is left unchanged by this pass. It is still reachable for the cases this pass
misses, notably the extern shared construct from hip and variables marked
constant which survive the optimizer.

This is of interest to the openmp project because the deviceRTL runtime library
uses cuda shared variables from functions that cannot be inlined. Trunk llvm
therefore cannot compile some openmp kernels for amdgpu. In addition to the
unit tests attached, this patch applied to ROCm llvm with fixed-abi enabled
and the function pointer hashing scheme deleted passes the openmp suite.

This lowering will use more LDS than strictly necessary. It is intended to be
a functionally correct fallback for cases that are difficult to target from
future optimisation passes.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D94648
2021-03-15 15:24:01 +00:00
Simon Pilgrim
2593dc40a4 [X86][SSE] canonicalizeShuffleWithBinOps - handle target shuffles.
Fold SHUFFLE(BINOP(SHUFFLE(X),SHUFFLE(Y))) -> BINOP(SHUFFLE'(X),SHUFFLE'(Y)) style patterns as well as the existing shuffles of constants.
2021-03-15 15:01:29 +00:00
David Green
19dfc4f3fc [AArch64] Zero extended extract_vector_elt pattern
This adds a pattern for i64 zext_inreg(i32 extract_vector_elt X),
producing a single UMOVvi16 instruction that is already expected to
clear the top bits. The exact pattern that this matches is
and(anyext(vector_extract X, lane), 0xff), similar to the sext patterns
higher up in the same file.

Differential Revision: https://reviews.llvm.org/D98599
2021-03-15 14:56:20 +00:00
Amy Kwan
d53e3aade2 [NFC][PowerPC] Add additional load/store test cases
This patch adds additional load/store test cases involving scalars, vectors,
and PC-Rel in preparation for the refactored load and store implementation
introduced in D93370.

Differential Revision: https://reviews.llvm.org/D97391
2021-03-15 08:54:38 -05:00
Wael Yehia
187913c5cf [PATCH] fix location of test case
from D97507.
2021-03-15 09:34:24 -04:00
Anton Afanasyev
0e939ab792 [SLP][Test] Precommit test for PR40522 2021-03-15 15:53:54 +03:00
Carl Ritson
aa27a4d8bb [AMDGPU] Fix shortfalls in WQM marking
When tracking defined lanes through phi nodes in the live range
graph each branch of the phi must be handled independently.
Also rewrite the marking algorithm to reduce unnecessary
operations.

Previously a shared set of defined lanes was used which caused
marking to stop prematurely. This was observable in existing lit
tests, but test patterns did not cover this detail.

Reviewed By: piotr

Differential Revision: https://reviews.llvm.org/D98614
2021-03-15 21:44:15 +09:00
Simon Pilgrim
f127ae93b8 [X86][SSE] canonicalizeShuffleWithBinOps - add X86ISD::PSHUFB handling.
Recommit rGcd938ab162b0ac560dd0e9fee290980c7e0e47e5 with an early-out if the pshub would introduce zeros across the binop.
2021-03-15 12:43:30 +00:00
Bradley Smith
19a711a469 [AArch64][SVE] Add unpredicated ld1/st1 patterns for reg+reg addressing modes
Differential Revision: https://reviews.llvm.org/D95677
2021-03-15 12:36:28 +00:00
Simon Pilgrim
eaf9528054 Revert rG9ba577eca2e339726bfaad4e615c6324a705b292 "[X86][SSE] canonicalizeShuffleWithBinOps - handle target shuffles. NFCI."
Sorry this wasn't supposed to be committed yet (and certainly not tagged as NFCI....)
2021-03-15 12:23:44 +00:00
Nikita Popov
7180decf64 Revert "[NFCI][ValueTracking] getUnderlyingObject(): gracefully handle cycles"
This reverts commit aa440ba24dc25e4c95f6dcf8ff647024f3b12661.

This has a non-trivial compile-time impact:
https://llvm-compile-time-tracker.com/compare.php?from=0c5b789c7342ee8384507c3242fc256e23248c4d&to=aa440ba24dc25e4c95f6dcf8ff647024f3b12661&stat=instructions

I don't believe this is the correct way to address the issue in
this case.
2021-03-15 13:12:39 +01:00
Simon Pilgrim
7525992d5b [X86][SSE] canonicalizeShuffleWithBinOps - handle target shuffles. NFCI.
Fold SHUFFLE(BINOP(SHUFFLE(X),SHUFFLE(Y))) -> BINOP(SHUFFLE'(X),SHUFFLE'(Y)) style patterns as well as the existing shuffles of constants.
2021-03-15 11:59:25 +00:00
Stephen Kelly
9bf084b8f1 [AST] Add generator for source location introspection
Generate a json file containing descriptions of AST classes and their
public accessors which return SourceLocation or SourceRange.

Use the JSON file to generate a C++ API and implementation for accessing
the source locations and method names for accessing them for a given AST
node.

This new API can be used to implement 'srcloc' output in clang-query:

  http://ce.steveire.com/z/m_kTIo

The JSON file can also be used to generate bindings for other languages,
such as Python and Javascript:

  https://steveire.wordpress.com/2019/04/30/the-future-of-ast-matching

In this first version of this feature, only the accessors for Stmt
classes are generated, not Decls, TypeLocs etc.  Those can be added
after this change is reviewed, as this change is mostly about
infrastructure of these code generators.

Also in this version, the platforms/cmake configurations are excluded as
much as possible so that support can be added iteratively.  Currently a
break on any platform causes a revert of the entire feature.  This way,
the `OR WIN32` can be removed in a future commit and if it breaks the
buildbots, only that commit gets reverted, making the entire process
easier to manage.

Differential Revision: https://reviews.llvm.org/D93164
2021-03-15 10:52:44 +00:00
Roman Lebedev
39d655ee4f [NFCI][ValueTracking] getUnderlyingObject(): gracefully handle cycles
Normally, this function just doesn't bother about cycles,
and hopes that the caller supplied small-enough depth
so that at worst it will take a potentially large,
but limited amount of time. But that obviously doesn't work
if there is no depth limit.

This reapples 36f1c3db66f7268ea3183bcf0bbf05b3e1c570b4,
but without asserting, just bailout once cycle is detected.
2021-03-15 13:51:02 +03:00
Fraser Cormack
a81aeba1b5 [RISCV] Support fixed-length vectors in the calling convention
This patch adds fixed-length vector support to the calling convention
when RVV is used to lower fixed-length vectors. The scheme follows the
regular vector calling convention for the argument/return registers, but
uses scalable vector container types as the LocVTs, and converts to/from
the fixed-length vector value types as required.

Fixed-length vector types may be split when the combination of minimum
VLEN and the maximum allowable LMUL is not large enough to fully contain
the vector. In this case the behaviour differs between fixed-length
vectors passed as parameters and as return values:
1. For return values, vectors must be passed entirely via registers or
via the stack.
2. For parameters, unlike scalar values, split vectors continue to be
passed by value, and are split across multiple registers until there are
no remaining registers. Thus vector parameters may be found partly in
registers and partly on the stack.

As with scalable vectors, the first fixed-length mask vector is passed
via v0. Split mask fixed-length vectors are passed first via v0 and then
via the next available vector register: v8,v9,etc.

The handling of vector return values uses all available argument
registers v8-v23 which does not adhere to the calling convention we're
supposedly implementing, but since this issue affects both fixed-length
and scalable-vector values, it was left as-is.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D97954
2021-03-15 10:43:51 +00:00
Jay Foad
8f1047f96b [AMDGPU] Use depth first iterator instead of recursive DFS. NFCI.
The reason for this is to avoid deep recursion in DFS() which can cause
stack overflow on large CFGs, especially on Windows.

Differential Revision: https://reviews.llvm.org/D98528
2021-03-15 10:32:55 +00:00
Simon Pilgrim
53a291a3fe Fix MSVC "switch statement contains 'default' but no 'case' labels" warning. NFCI. 2021-03-15 09:45:45 +00:00
Simon Pilgrim
e9b7429757 [X86][SSE] Attempt to merge single-op hops for slow targets.
For slow-hop targets, see if any single-op hops are duplicating work already done on another (dual-op) hop, which can sometimes occur as isHorizontalBinOp tries to find potential duplicates (but can't merge them itself). If so, reuse the other hop and shuffle the result.
2021-03-15 09:30:20 +00:00
Roman Lebedev
a289925031 Revert "[NFCI][ValueTracking] getUnderlyingObject(): assert that no cycles are encountered"
This reverts commit 36f1c3db66f7268ea3183bcf0bbf05b3e1c570b4.
Seems to make bots unhappy.
2021-03-15 12:00:59 +03:00
Roman Lebedev
b4b4a1122d [NFCI][ValueTracking] getUnderlyingObject(): assert that no cycles are encountered
Jeroen Dobbelaere in
https://lists.llvm.org/pipermail/llvm-dev/2021-March/149206.html
is reporting that this function can end up in an endless loop
when called from SROA w/ full restrict patches.

For now, simply ensure that such problems are caught earlier/easier.
2021-03-15 11:52:31 +03:00
Max Kazantsev
dee7bb8299 [Test] Replace checks with auto-generated checks 2021-03-15 14:32:00 +07:00
Hongtao Yu
9e418be560 [NFC][Inliner] Debugging support to print funtion size after each inlining.
Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D98439
2021-03-14 22:11:53 -07:00