1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-23 11:13:28 +01:00
Commit Graph

209883 Commits

Author SHA1 Message Date
Nico Weber
b16664144b [gn build] (manually) port 933518fff82c 2021-01-19 18:51:39 -05:00
Ian Levesque
cb3e4b9e0e [xray] Honor xray-never function-instrument attribute
function-instrument=xray-never wasn't actually honored before. We were
getting lucky that it worked because CodeGenFunction would omit the
other xray attributes when a function was annotated with
xray_never_instrument. This patch adds proper support.

Differential Revision: https://reviews.llvm.org/D89441
2021-01-19 18:47:09 -05:00
Wei Mi
6e11fd80e6 Fix Wmissing-field-initializers warnings. 2021-01-19 15:26:52 -08:00
Wei Mi
c2ff32c8b0 [SampleFDO] Add the support to split the function profiles with context into
separate sections.

For ThinLTO, all the function profiles without context has been annotated to
outline functions if possible in prelink phase. In postlink phase, profile
annotation in postlink phase is only meaningful for function profile with
context. If the profile is large, it is better to split the profile into two
parts, one with context and one without, so the profile reading in postlink
phase only has to read the part with context. To have the profile splitting,
we extend the ExtBinary format to support different section arrangement. It
will be flexible to add other section layout in the future without the need
to create new class inheriting from ExtBinary class.

Differential Revision: https://reviews.llvm.org/D94435
2021-01-19 15:16:19 -08:00
Sam Clegg
a3e0bfc296 Revert "[WebAssembly] call_indirect issues table number relocs"
This reverts commit 418df4a6ab35d343cc0f2608c90a73dd9b8d0ab1.

This change broke emscripten tests, I believe because it started
generating 5-byte a wide table index in the call_indirect instruction.
Neither v8 nor wabt seem to be able to handle that.  The spec
currently says that this is single 0x0 byte and:

"In future versions of WebAssembly, the zero byte occurring in the
encoding of the call_indirectcall_indirect instruction may be used to
index additional tables."

So we need to revisit this change.  For backwards compat I guess
we need to guarantee that __indirect_function_table is always at
address zero.   We could also consider making this a single-byte
relocation with and assert if have more than 127 tables (for now).

Differential Revision: https://reviews.llvm.org/D95005
2021-01-19 15:06:07 -08:00
Craig Topper
6d7ba9369f [RISCV] Remove NotHasStdExtZbb predicate from zext.h/sext.b/sext.h InstAliases. NFC
NotHasStdExtZbb doesn't have an AssemblerPredicate associated with it
so it didn't do anything. We don't need it either because the sorting
rules in tablegen prioritize by number of predicates. So the
dedicated instructions in the B extension that have predicates
will be prioritized automatically.
2021-01-19 14:31:48 -08:00
Arthur Eubanks
19e4267ba4 [polly][NewPM][test] Fix polly tests under -enable-new-pm
In preparation for turning on opt's -enable-new-pm by default, this pins
uses of passes via the legacy "opt -passname" with pass names beginning
with "polly-" and "polyhedral-info" to the legacy PM. Many of these
tests use -analyze, which isn't supported in the new PM.

(This doesn't affect uses of "opt -passes=passname").

rL240766 accidentally removed `-polly-prepare` in
phi_not_grouped_at_top.ll, and it also doesn't use the output of
-analyze.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D94266
2021-01-19 12:38:58 -08:00
Mircea Trofin
58819de749 [NFC] Disallow unused prefixes under Other
Differential Revision: https://reviews.llvm.org/D94853
2021-01-19 12:22:29 -08:00
Alexey Bataev
13011c3c3f Revert "[SLP]Merge reorder and reuse shuffles."
This reverts commit 438682de6a38ac97f89fa38faf5c8dc9b09cd9ad to fix the
bug with the reducing size of the resulting vector for the entry node
with multiple users.
2021-01-19 11:48:04 -08:00
Jeroen Dobbelaere
6f9b7491b6 [NFC] cleanup noalias2.ll test
D75825 and D75828 modified llvm/test/Transforms/Inline/noalias2.ll to handle llvm.assume. The checking though was broken.
The NO_ASSUME has been replaced by a normal CHECK; the ASSUME rules were never triggered and have been removed.
The test checks have been regenerated.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D94978
2021-01-19 20:47:02 +01:00
Mitch Phillips
c784193edd Revert "[PDB] Defer relocating .debug$S until commit time and parallelize it"
This reverts commit 6529d7c5a45b1b9588e512013b02f891d71bc134.

Reason: Broke the ASan buildbots.
http://lab.llvm.org:8011/#/builders/99/builds/1567
2021-01-19 11:45:48 -08:00
Jonas Devlieghere
ef10d9ddc8 [llvm] Protect signpost map with a mutex
Use a mutex to protect concurrent access to the signpost map. This fixes
nondeterministic crashes in LLDB that appeared after using signposts in
the timer implementation.

Differential revision: https://reviews.llvm.org/D94285
2021-01-19 11:41:54 -08:00
Mariya Podchishchaeva
9a03068d26 [ScalarizeMaskedMemIntrin] Add missing dependency
The pass has dependency on 'TargetTransformInfoWrapperPass', but the
corresponding call to INITIALIZE_PASS_DEPENDENCY was missing.

Differential Revision: https://reviews.llvm.org/D94916
2021-01-19 22:33:47 +03:00
Nikita Popov
54a7845eef Reapply [InstCombine] Replace one-use select operand based on condition
Relative to the original change, this adds a check that the
instruction on which we're replacing operands is safe to speculatively
execute, because that's what we're effectively doing. We're executing
the instruction with the replaced operand, which is fine if it's pure,
but not fine if can cause side-effects or UB (aka is not speculatable).

Additionally, we cannot (generally) replace operands in phi nodes,
as these may refer to a different loop iteration. This is also covered
by the speculation check.

-----

InstCombine already performs a fold where X == Y ? f(X) : Z is
transformed to X == Y ? f(Y) : Z if f(Y) simplifies. However,
if f(X) only has one use, then we can always directly replace the
use inside the instruction. To actually be profitable, limit it to
the case where Y is a non-expr constant.

This could be further extended to replace uses further up a one-use
instruction chain, but for now this only looks one level up.

Among other things, this also subsumes D94860.

Differential Revision: https://reviews.llvm.org/D94862
2021-01-19 20:26:38 +01:00
Nikita Popov
27bab255e6 [InstCombine] Add additional tests for select operand replacement (NFC)
In particular, add tests for speculatable and non-speculatable
instructions.
2021-01-19 20:26:38 +01:00
Craig Topper
14aa520218 [RISCV] Add DAG combine to turn (setcc X, 1, setne) -> (setcc X, 0, seteq) if we can prove X is 0/1.
If we are able to compare with 0 instead of 1, we might be able
to fold the setcc into a beqz/bnez.

Often these setccs start life as an xor that gets converted to
a setcc by DAG combiner's rebuildSetcc. I looked into a detecting
(xor X, 1) and converting to (seteq X, 0) based on boolean contents
being 0/1 in rebuildSetcc instead of using computeKnownBits. It was
very perturbing to AMDGPU tests which I didn't look closely at.
It had a few changes on a couple other targets, but didn't seem
to be much if any improvement.

Reviewed By: lenary

Differential Revision: https://reviews.llvm.org/D94730
2021-01-19 11:21:48 -08:00
Jeroen Dobbelaere
116cd71f2c [noalias.decl] Look through llvm.experimental.noalias.scope.decl
Just like llvm.assume, there are a lot of cases where we can just ignore llvm.experimental.noalias.scope.decl.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D93042
2021-01-19 20:09:42 +01:00
Brendon Cahoon
ccb1095e5c [Hexagon] Fix segment start to adjust for gaps between segments
The Hexagon Vector Combine pass genertes stores for a complete
aligned vector. The start of each section is a multiple of the
vector size, so that value is passed to normalize to compute
the offset of the stores in the section.  The first store may
not occur at offset 0 when there is a gap between sections.
2021-01-19 12:49:39 -06:00
Jay Foad
9f5dab7186 [AMDGPU] Simpler names for arch-specific ttmp registers. NFC.
Rename the *_gfx9_gfx10 ttmp registers to *_gfx9plus for simplicity,
and use the corresponding isGFX9Plus predicate to decide when to use
them instead of the old *_vi versions.

Differential Revision: https://reviews.llvm.org/D94975
2021-01-19 18:47:14 +00:00
Jessica Paquette
05fff88674 Fix buildbot after cfc60730179042a93cb9cb338982e71d20707a24
Windows buildbots were not happy with using find_if + instructionsWithoutDebug.

In cfc60730179042a9, instructionsWithoutDebug is not technically necessary. So,
just iterate over the block directly.

http://lab.llvm.org:8011/#/builders/127/builds/4732/steps/7/logs/stdio
2021-01-19 10:38:04 -08:00
Jessica Paquette
c4d2d8a4de [GlobalISel] Combine (a[0]) | (a[1] << k1) | ...| (a[m] << kn) into a wide load
This is a restricted version of the combine in `DAGCombiner::MatchLoadCombine`.
(See D27861)

This tries to recognize patterns like below (assuming a little-endian target):

```
s8* x = ...
s32 val = a[0] | (a[1] << 8) | (a[2] << 16) | (a[3] << 24)
->
s32 val = *((i32)a)

s8* x = ...
s32 val = a[3] | (a[2] << 8) | (a[1] << 16) | (a[0] << 24)
->
s32 val = BSWAP(*((s32)a))
```

(This patch also handles the big-endian target case as well, in which the first
example above has a BSWAP, and the second example above does not.)

To recognize the pattern, this searches from the last G_OR in the expression
tree.

E.g.

```
    Reg   Reg
     \    /
      OR_1   Reg
       \    /
        OR_2
          \     Reg
           .. /
          Root
```

Each non-OR register in the tree is put in a list. Each register in the list is
then checked to see if it's an appropriate load + shift logic.

If every register is a load + potentially a shift, the combine checks if those
loads + shifts, when OR'd together, are equivalent to a wide load (possibly with
a BSWAP.)

To simplify things, this patch

(1) Only handles G_ZEXTLOADs (which appear to be the common case)
(2) Only works in a single MachineBasicBlock
(3) Only handles G_SHL as the bit twiddling to stick the small load into a
    specific location

An IR example of this is here: https://godbolt.org/z/4sP9Pj (lifted from
test/CodeGen/AArch64/load-combine.ll)

At -Os on AArch64, this is a 0.5% code size improvement for CTMark/sqlite3,
and a 0.4% improvement for CTMark/7zip-benchmark.

Also fix a bug in `isPredecessor` which caused it to fail whenever `DefMI` was
the first instruction in the block.

Differential Revision: https://reviews.llvm.org/D94350
2021-01-19 10:24:27 -08:00
Fraser Cormack
00b92c78b2 [RISCV] Add ISel patterns for scalable mask exts & truncs
Original patch by @rogfer01.

This patch adds support for sign-, zero-, and any-extension from
scalable mask vector types to integer vector types, as well as
truncation in the opposite direction.

Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Fraser Cormack <fraser@codeplay.com>

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D94590
2021-01-19 18:13:15 +00:00
Abhina Sreeskantharajan
74052a31cc [SystemZ][z/OS] Fix Permission denied pattern matching
On z/OS, the error message "EDC5111I Permission denied." is not matched correctly in lit tests. This patch updates the check expression to match successfully.

Differential Revision: https://reviews.llvm.org/D94432
2021-01-19 13:05:52 -05:00
David Green
04f2bf7a46 [ARM] Expand vXi1 VSELECT's
We have no lowering for VSELECT vXi1, vXi1, vXi1, so mark them as
expanded to turn them into a series of logical operations.

Differential Revision: https://reviews.llvm.org/D94946
2021-01-19 17:56:50 +00:00
Nikita Popov
d56d32f92d [ValueTracking] Strengthen impliesPoison reasoning
Split impliesPoison into two recursive walks, one over V, the
other over ValAssumedPoison. This allows us to reason about poison
implications in a number of additional cases that are important
in practice. This is a generalized form of D94859, which handles
the cmp to cmp implication in particular.

Differential Revision: https://reviews.llvm.org/D94866
2021-01-19 18:04:23 +01:00
Jay Foad
a8ed61d9be [AMDGPU] Fix test case for D94010 2021-01-19 16:46:47 +00:00
Jay Foad
911caa6874 [AMDGPU] Simplify test case for D94010 2021-01-19 16:36:43 +00:00
Fraser Cormack
8876d555ae [RISCV] Extend RVV VType info with the type's AVL (NFC)
This patch factors out the "VLMax" operand passed to most
scalable-vector ISel patterns into a property of each VType.

This is seen as a preparatory change to allow RVV in the future to
more easily support fixed-length vector types with constrained vector
lengths, with the AVL operand set to the length of the fixed-length
vector. It has no effect on the scalable code generation path.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D94594
2021-01-19 15:46:56 +00:00
David Green
c7d5648431 [ARM] Add MVE add.sat costs
This adds some basic MVE sadd_sat/ssub_sat/uadd_sat/usub_sat costs,
based on when the instruction is legal. With smaller than legal types
that are promoted we generate shr(qadd(shl, shl)), so the cost is 4
appropriately.

Differential Revision: https://reviews.llvm.org/D94958
2021-01-19 15:38:46 +00:00
Valentin Clement
715d713b9e [flang][directive] Get rid of flangClassValue in TableGen
The TableGen emitter for directives has two slots for flangClass information and this was mainly
to be able to keep up with the legacy openmp parser at the time. Now that all clauses are encapsulated in
AccClause or OmpClause, these two strings are not necessary anymore and were the the source of couple
of problem while working with the generic structure checker for OpenMP.
This patch remove the flangClassValue string from DirectiveBase.td and use the string flangClass as the
placeholder for the encapsulated class.

Reviewed By: sameeranjoshi

Differential Revision: https://reviews.llvm.org/D94821
2021-01-19 10:28:46 -05:00
Victor Huang
c88264fd68 [PowerPC] Fix the check for the instruction using FRSP/XSRSP output register
When performing peephole optimization to simplify the code, after removing
passed FPSP/XSRSP instruction we will set any uses of that FRSP/XSRSP to the
source of the FRSP/XSRSP.

We are finding the machine instruction using virtual register holding FRSP/XSRSP
results by searching all following instructions and encountering an issue
that the first use of the virtual register is a debug MI causing:
1. virtual register in the debug MI removed unexpectedly.
2. virtual register used in non-debug MI not replaced with the source of
  FRSP/XSRSP. which stays in a undef status.

This patch fix the issue by only searching non-debug machine instruction using
virtual register holding FRSP/XSRSP results when the vr only has one non debug
usage.

Differential Revisien: https://reviews.llvm.org/D94711
Reviewed by: nemanjai
2021-01-19 09:20:03 -06:00
Raul Tambre
732857e164 [CMake] Remove dead code setting policies to NEW
cmake_minimum_required(VERSION) calls cmake_policy(VERSION),
which sets all policies up to VERSION to NEW.
LLVM started requiring CMake 3.13 last year, so we can remove
a bunch of code setting policies prior to 3.13 to NEW as it
no longer has any effect.

Reviewed By: phosek, #libunwind, #libc, #libc_abi, ldionne

Differential Revision: https://reviews.llvm.org/D94374
2021-01-19 17:19:36 +02:00
David Green
218cbd9ab8 [ARM] Expand add.sat/sub.sat cost checks. NFC 2021-01-19 15:06:06 +00:00
Florian Hahn
46a4268cbe [LoopRotate] Calls not lowered to calls should not block rotation.
83daa49758a1 made loop-rotate more conservative in the presence of
function calls in the prepare-for-lto stage. The code did not properly
account for calls that are no actual function calls, like calls to
intrinsics. This patch updates the code to ensure only calls that are
lowered to actual calls are considered inline candidates.
2021-01-19 14:37:36 +00:00
Simon Pilgrim
3e379c7903 [X86] Regenerate fmin/fmax reduction tests
Add missing check-prefixes + v1f32 tests
2021-01-19 14:28:44 +00:00
Tim Northover
ed1f4159c7 AArch64: add apple-a14 as a CPU
This CPU supports all v8.5a features except BTI, and so identifies as v8.5a to
Clang. A bit weird, but the best way for things like xnu to detect the new
features it cares about.
2021-01-19 14:04:53 +00:00
Hans Wennborg
12b7677de1 [ThinLTO] Also prune Thin-* files from the ThinLTO cache
Such files (Thin-%%%%%%.tmp.o) are supposed to be deleted immediately
after they're used (either by renaming or deletion). However, we've seen
instances on Windows where this doesn't happen, probably due to the
filesystem being flaky. This is effectively a resource leak which has
prevented us from using the ThinLTO cache on Windows.

Since those temporary files are in the thinlto cache directory which we
prune periodically anyway, allowing them to be pruned too seems like a
tidy way to solve the problem.

Differential revision: https://reviews.llvm.org/D94962
2021-01-19 14:43:49 +01:00
Med Ismail Bennani
3496691e63 [llvm/Orc] Fix ExecutionEngine module build breakage
This patch updates the llvm module map to reflect changes made in
`24672ddea3c97fd1eca3e905b23c0116d7759ab8` and fixes the module builds
(`-DLLVM_ENABLE_MODULES=On`).

Signed-off-by: Med Ismail Bennani <medismail.bennani@gmail.com>
2021-01-19 14:39:06 +01:00
Caroline Concatto
90b2e7be56 [AArch64][SVE]Add cost model for vector reduce for scalable vector
This patch computes the cost for vector.reduce<operand> for scalable vectors.
The cost is split into two parts:  the legalization cost and the horizontal
reduction.

Differential Revision: https://reviews.llvm.org/D93639
2021-01-19 11:54:16 +00:00
Simon Pilgrim
6007680bfe [X86][SSE] combineVectorSignBitsTruncation - fold trunc(srl(x,c)) -> packss(sra(x,c))
If a srl doesn't introduce any sign bits into the truncated result, then replace with a sra to let us use a PACKSS truncation - fixes a regression noticed in D56387 on pre-SSE41 targets that don't have PACKUSDW.
2021-01-19 11:04:13 +00:00
Hans Wennborg
78cd25d7f9 Revert 5238e7b302 "[InstCombine] Replace one-use select operand based on condition"
This caused a miscompile in Chromium, see comments on the codereview for
discussion and pointer to a reproducer.

> InstCombine already performs a fold where X == Y ? f(X) : Z is
> transformed to X == Y ? f(Y) : Z if f(Y) simplifies. However,
> if f(X) only has one use, then we can always directly replace the
> use inside the instruction. To actually be profitable, limit it to
> the case where Y is a non-expr constant.
>
> This could be further extended to replace uses further up a one-use
> instruction chain, but for now this only looks one level up.
>
> Among other things, this also subsumes D94860.
>
> Differential Revision: https://reviews.llvm.org/D94862

This also reverts the follow-up
a003f26539cf4db744655e76c41f4c4a8913f116:

> [llvm] Prevent infinite loop in InstCombine of select statements
>
> This fixes an issue where the RHS and LHS the comparison operation
> creating the predicate were swapped back and forth forever.
>
> Differential Revision: https://reviews.llvm.org/D94934
2021-01-19 11:50:56 +01:00
Jay Foad
a816310ce3 [AMDGPU] Simplify AMDGPUInstPrinter::printExpSrcN. NFC.
Change-Id: Idd7f47647bc0faa3ad6f61f44728c0f20540ec00
2021-01-19 10:39:56 +00:00
Florian Hahn
d596025713 [LoopRotate] Add PrepareForLTO stage, avoid rotating with inline cands.
D84108 exposed a bad interaction between inlining and loop-rotation
during regular LTO, which is causing notable regressions in at least
CINT2006/473.astar.

The problem boils down to: we now rotate a loop just before the vectorizer
which requires duplicating a function call in the preheader when compiling
the individual files ('prepare for LTO'). But this then prevents further
inlining of the function during LTO.

This patch tries to resolve this issue by making LoopRotate more
conservative with respect to rotating loops that have inline-able calls
during the 'prepare for LTO' stage.

I think this change intuitively improves the current situation in
general. Loop-rotate tries hard to avoid creating headers that are 'too
big'. At the moment, it assumes all inlining already happened and the
cost of duplicating a call is equal to just doing the call. But with LTO,
inlining also happens during full LTO and it is possible that a previously
duplicated call is actually a huge function which gets inlined
during LTO.

From the perspective of LV, not much should change overall. Most loops
calling user-provided functions won't get vectorized to start with
(unless we can infer that the function does not touch memory, has no
other side effects). If we do not inline the 'inline-able' call during
the LTO stage, we merely delayed loop-rotation & vectorization. If we
inline during LTO, chances should be very high that the inlined code is
itself vectorizable or the user call was not vectorizable to start with.

There could of course be scenarios where we inline a sufficiently large
function with code not profitable to vectorize, which would have be
vectorized earlier (by scalarzing the call). But even in that case,
there probably is no big performance impact, because it should be mostly
down to the cost-model to reject vectorization in that case. And then
the version with scalarized calls should also not be beneficial. In a way,
LV should have strictly more information after inlining and make more
accurate decisions (barring cost-model issues).

There is of course plenty of room for things to go wrong unexpectedly,
so we need to keep a close look at actual performance and address any
follow-up issues.

I took a look at the impact on statistics for
MultiSource/SPEC2000/SPEC2006. There are a few benchmarks with fewer
loops rotated, but no change to the number of loops vectorized.

Reviewed By: sanwou01

Differential Revision: https://reviews.llvm.org/D94232
2021-01-19 10:15:29 +00:00
Yvan Roux
178bc607c9 [ARM][MachineOutliner] Add stack fixup feature
This patch handles cases where we have to save/restore the link register
into the stack and and load/store instruction which use the stack are
part of the outlined region. It checks that there will be no overflow
introduced by the new offset and fixup these instructions accordingly.

Differential Revision: https://reviews.llvm.org/D92934
2021-01-19 10:59:09 +01:00
Fraser Cormack
d6f7a6374a [RISCV] Add scalable-vector integer extension patterns
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D94694
2021-01-19 09:30:36 +00:00
Tres Popp
f0a947fd0e [llvm] Prevent infinite loop in InstCombine of select statements
This fixes an issue where the RHS and LHS the comparison operation
creating the predicate were swapped back and forth forever.

Differential Revision: https://reviews.llvm.org/D94934
2021-01-19 10:31:48 +01:00
serge-sans-paille
1734c4008a [lit] Harmonize lit and llvm versionning
In addition to consistency, we'll hit a wall when 11.1.0 gets released, because
we cannot represent it with lit versioning scheme.

Differential Revision: https://reviews.llvm.org/D94157
2021-01-19 10:27:14 +01:00
Lang Hames
3c8ce89cc8 [ORC] Move LookupRequest from OrcShared to Orc.
It depends on Orc types (SymbolLookupSet), so can't be part of OrcShared.
2021-01-19 20:23:47 +11:00
Tres Popp
29b8ccfd0c [llvm][nvptx] add atomicity to counter in ISelLowering
Previously uniqueCallSite could have race conditions between different
threads. Now it is accessed with an atomic RMW and will be unique
between different threads.

Differential Revision: https://reviews.llvm.org/D94784
2021-01-19 10:20:20 +01:00
David Sherwood
6c0762e51e [NFC] Make remaining cost functions in LoopVectorize.cpp use InstructionCost
A previous patch has already changed getInstructionCost to return
an InstructionCost type. This patch changes the other various
getXXXCost functions to return an InstructionCost too. This is a
non-functional change - I've added a few asserts that the costs
are valid in places where we're selecting between vector call
and intrinsic costs. However, since we don't yet return invalid
costs from any of the TTI implementations these asserts should
not fire.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Differential Revision: https://reviews.llvm.org/D94065
2021-01-19 09:08:40 +00:00