1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-22 10:42:39 +01:00
Commit Graph

218511 Commits

Author SHA1 Message Date
Kai Luo
bb52bc77a5 [PowerPC] Generate inlined quadword lock free atomic operations via AtomicExpand
This patch uses AtomicExpandPass to implement quadword lock free atomic operations. It adopts the method introduced in https://reviews.llvm.org/D47882, which expand atomic operations post RA to avoid spilling that might prevent LL/SC progress.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D103614
2021-07-15 01:12:09 +00:00
Kuter Dinel
bfdc40f15b [AMDGPU] Use update_test_checks.py script for annotate kernel features tests.
This patch makes the annotate kernel features tests use the update_tests_checks.py
script. Which makes it easy to update the tests.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D105864
2021-07-15 03:13:37 +03:00
Thomas Lively
3c50e4a7a7 [WebAssembly] Codegen for v128.storeX_lane instructions
Replace the experimental clang builtins and LLVM intrinsics for these
instructions with normal codegen patterns. Resolves PR50435.

Differential Revision: https://reviews.llvm.org/D106019
2021-07-14 16:15:25 -07:00
Jon Roelofs
daf8a095a1 [GlobalOpt] Fix a miscompile when evaluating struct initializers.
The bug was that evaluateBitcastFromPtr attempts a narrowing to a struct's 0th
element of a store that covers other elements. While this is okay on the load
side, applying it to stores causes us to miss the writes to the additionally
covered elements.

rdar://79503568

Differential revision: https://reviews.llvm.org/D105838
2021-07-14 15:37:01 -07:00
Steven Wu
d31e29ae39 [Support] Turn on SupportTest for Apple Silicon
Follow up for D106012, turn on unittest for Host on Apple Silicon.

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D106020
2021-07-14 15:24:56 -07:00
Arthur Eubanks
f5788bb9c7 [docs][OpaquePtr] Remove finished task 2021-07-14 14:36:41 -07:00
Wolfgang Pieb
bf7a422513 [ARM] Fix RELA relocations for 32bit ARM.
RELA relocations for 32 bit ARM ignored the addend. Some tools generate
them instead of REL type relocations. This fixes PR50473.

    Reviewed By: MaskRay, peter.smith

    Differential Revision: https://reviews.llvm.org/D105214
2021-07-14 14:27:15 -07:00
Derek Schuff
f708d3928c [llvm-strip][WebAssembly] Support strip flags
Summary:
Add support for the basic section stripping (and keeping) flags for wasm:
strip with no flags, --strip-all, --strip-debug,
--only-section, --keep-section, and --only-keep-debug.

Factor section removal into a function and use a predicate chain like
the ELF implementation.

Reviewers: jhenderson, sbc100

Differential Revision: https://reviews.llvm.org/D73820
2021-07-14 14:17:02 -07:00
Arthur Eubanks
a8da5bdd63 Precommit test for D106017 2021-07-14 14:14:49 -07:00
Arthur Eubanks
9d99a13a85 [SimpleLoopUnswitch] Don't non-trivially unswitch loops with catchswitch exits
SplitBlock() can't handle catchswitch.

Fixes PR50973.

Reviewed By: aheejin

Differential Revision: https://reviews.llvm.org/D105672
2021-07-14 14:07:28 -07:00
Jon Roelofs
ffc7470172 [AArch64] Fix selection of G_UNMERGE <2 x s16>
Differential revision: https://reviews.llvm.org/D106007
2021-07-14 13:40:56 -07:00
Philip Reames
99a7d1e6cf [tests] Stablize tests for possible change in deref semantics
This is conceptually part of e75a2dfe.  This file contains both tests whose results don't change (with the right attributes added), and tests which fundementally regress with the current proposal.  Doing the update took some care, thus the seperate change.

Here's the e75a2dfe context repeated:

There's a potential change in dereferenceability attribute semantics in the nearish future.  See llvm-dev thread "RFC: Decomposing deref(N) into deref(N) + nofree" and D99100 for context.

This change simply adds appropriate attributes to tests to keep transform logic exercised under both old and new/proposed semantics.  Note that for many of these cases, O3 would infer exactly these attributes on the test IR.

This change handles the idiomatic pattern of a dereferenceable object being passed to a call which can not free that memory.  There's a couple other tests which need more one-off attention, they'll be handled in another change.
2021-07-14 13:37:50 -07:00
Steven Wu
9c64643022 [Support] Get correct number of physical cores on Apple Silicon
Fix a bug that `computeHostNumPhysicalCores` is fallback to default
unknown when building for Apple Silicon macs.

rdar://80533675

Reviewed By: arphaman

Differential Revision: https://reviews.llvm.org/D106012
2021-07-14 13:29:54 -07:00
Philip Reames
652d77cf2f Global variables with strong definitions cannot be freed
With the current deref semantics, this is redundant - since we assume that anything which is dereferenceable (ever) can't be freed - but it becomes neccessary for the deref-at-point semantics.

Testing wise, this is covered by test/CodeGen/X86/hoist-invariant-load.ll when -use-dereferenceable-at-point-semantics is active.  I didn't bother duplicating the command line since a) it's an in-development mode, and b) the change is pretty obvious.
2021-07-14 13:26:18 -07:00
Philip Reames
b12ecd2ebd [tests] Stablize tests for possible change in deref semantics
There's a potential change in dereferenceability attribute semantics in the nearish future.  See llvm-dev thread "RFC: Decomposing deref(N) into deref(N) + nofree" and D99100 for context.

This change simply adds appropriate attributes to tests to keep transform logic exercised under both old and new/proposed semantics.  Note that for many of these cases, O3 would infer exactly these attributes on the test IR.

This change handles the idiomatic pattern of a dereferenceable object being passed to a call which can not free that memory.  There's a couple other tests which need more one-off attention, they'll be handled in another change.
2021-07-14 13:05:43 -07:00
Stanislav Mekhanoshin
1dae83cabb [AMDGPU] Add TII::isIgnorableUse() to allow VOP rematerialization
Any def of EXEC prevents rematerialization of any VOP instruction
because of the physreg use. Create a callback to check if the
physreg use can be ingored to allow rematerialization.

Differential Revision: https://reviews.llvm.org/D105836
2021-07-14 13:03:58 -07:00
Alexey Bataev
79d83636b9 [SLP][NFC]Fix variables names, NFC. 2021-07-14 12:43:45 -07:00
Fangrui Song
f00933b0e4 [docs] Fix :option:--file-header reference in llvm-readelf.rst after D105532 2021-07-14 12:39:22 -07:00
Simon Pilgrim
dd97f9edd3 [SLP] Fix case of variable name. NFCI. 2021-07-14 20:20:04 +01:00
Roman Lebedev
60d35e606c [NFC] Drop redundant check prefixes in newly added test file 2021-07-14 22:14:36 +03:00
Nikita Popov
0b8ca13018 [Attributes] Use single method to fetch type from AttributeSet (NFC)
While it is nice to have separate methods in the public AttributeSet
API, we can fetch the type from the internal AttributeSetNode
using a generic API for all type attribute kinds.
2021-07-14 21:10:56 +02:00
Roman Lebedev
c64b97d8cf [NFC][PhaseOrdering] Add test for the lack of CSE after SimplifyCFG (PR51092) 2021-07-14 22:07:38 +03:00
David Green
5da5b28644 [ARM] Move add(VMLALVA(A, X, Y), B) to VMLALVA(add(A, B), X, Y)
For i64 reductions we currently try and convert add(VMLALV(X, Y), B) to
VMLALVA(B, X, Y), incorporating the addition into the VMLALVA. If we
have an add of an existing VMLALVA, this patch pushes the add up above
the VMLALVA so that it may potentially be simplified further, for
example being folded into another VMLALV.

Differential Revision: https://reviews.llvm.org/D105686
2021-07-14 20:06:49 +01:00
Nikita Popov
b227a46d3d [Verifier] Improve incompatible attribute type check
A couple of attributes had explicit checks for incompatibility
with pointer types. However, this is already handled generically
by the typeIncompatible() check. We can drop these after adding
SwiftError to typeIncompatible().

However, the previous implementation of the check prints out all
attributes that are incompatible with a given type, even though
those attributes aren't actually used. This has the annoying
result that the error message changes every time a new attribute
is added to the list. Improve this by explicitly finding which
attribute isn't compatible and printing just that.
2021-07-14 21:02:10 +02:00
Saleem Abdulrasool
ba012f02f8 Demangle: correct swift_async demangling for Microsoft scheme
The emission was corrected for the swift_async calling convention but
the demangling support was not.  This repairs the demangling support as
well.
2021-07-14 11:43:44 -07:00
Eli Friedman
0af449d2a7 [SelectionDAG] Add an overload of getStepVector that assumes step 1.
This is mostly a minor convenience, but the pattern seems frequent
enough to be worthwhile (and we'll probably add more uses in the
future).

Differential Revision: https://reviews.llvm.org/D105850
2021-07-14 11:37:01 -07:00
Thomas Lively
cf44692539 [WebAssembly] Codegen for v128.loadX_lane instructions
Replace the experimental clang builtin and LLVM intrinsics for these
instructions with normal codegen patterns. Resolves PR50433.

Differential Revision: https://reviews.llvm.org/D105950
2021-07-14 11:31:53 -07:00
Thomas Lively
81bb5f99ad [WebAssembly] Remove datalayout strings from llc tests
The data layout strings do not have any effect on llc tests and will become
misleadingly out of date as we continue to update the canonical data layout, so
remove them from the tests.

Differential Revision: https://reviews.llvm.org/D105842
2021-07-14 11:17:08 -07:00
David Green
bb9a6d3734 [ARM] Lower v16i8 -> i64 VMLA reductions.
MVE does not have a VMLALV instruction that can perform v16i8 -> i64
reductions, like it does for v8i16->i64 and v4i32->i64 reductions. That
means that the pattern to create them will be spilt up by type
legalization, creating a lot of instructions.

This extends the patterns for matching i64 reductions a little to handle
the v16i8->i64 case. We need to turn them into a pair of v8i16->i64
VMLALVs that each perform half of the reduction and are summed together
(so the later is a VMLALVA). The order of the lanes does not matter for
the reduction so we generate a MVEEXT for the extension, that will
either be folded into a extending load or can be optimized to a
VREV/VMOVL. Some of the resulting codegen isn't optimal, but will be
improved in a later patch.

Differential Revision: https://reviews.llvm.org/D105680
2021-07-14 18:11:32 +01:00
Sanjay Patel
cd35b5a262 [InstCombine] reorder icmp with offset folds for better results
This set of folds was added recently with:
c7b658aeb526
0c400e895306
40b752d28d95

...and I noted that this wasn't likely to fire in code derived
from C/C++ source because of nsw in particular. But I didn't
notice that I had placed the code above the no-wrap block
of transforms.

This is likely the cause of regressions noted from the previous
commit because -- as shown in the test diffs -- we may have
transformed into a compare with an arbitrary constant rather
than a simpler signbit test.
2021-07-14 12:12:05 -04:00
Sanjay Patel
9552511c41 [InstCombine] add tests for icmp with constant offset and no-wrap flags; NFC 2021-07-14 12:12:05 -04:00
Sander de Smalen
0fddd6dc0c [LV] Print remark when loop cannot be vectorized due to invalid costs.
This patch emits remarks for instructions that have invalid costs for
a given set of vectorization factors. Some example output:

  t.c:4:19: remark: Instruction with invalid costs prevented vectorization at VF=(vscale x 1): load
      dst[i] = sinf(src[i]);
                    ^
  t.c:4:14: remark: Instruction with invalid costs prevented vectorization at VF=(vscale x 1, vscale x 2, vscale x 4): call to llvm.sin.f32
      dst[i] = sinf(src[i]);
               ^
  t.c:4:12: remark: Instruction with invalid costs prevented vectorization at VF=(vscale x 1): store
      dst[i] = sinf(src[i]);
             ^

Reviewed By: fhahn, kmclaughlin

Differential Revision: https://reviews.llvm.org/D105806
2021-07-14 17:11:33 +01:00
Matt Arsenault
496d9d6a80 GlobalISel: Handle lowering non-power-of-2 extloads 2021-07-14 11:54:11 -04:00
Sander de Smalen
d616b58c4d [CostModel][AArch64] Make loads/stores of <vscale x 1 x eltty> invalid.
At the moment, <vscale x 1 x eltty> are not yet fully handled by the
code-generator, so to avoid vectorizing loops with that VF, we mark the
cost for these types as invalid.
The reason for not adding a new "TTI::getMinimumScalableVF" is because
the type is supposed to be a type that can be legalized. It partially is,
although the support for these types need some more work.

Reviewed By: paulwalker-arm, dmgreen

Differential Revision: https://reviews.llvm.org/D103882
2021-07-14 16:44:22 +01:00
Jay Foad
954a1c2142 [AMDGPU] Check llc-pipeline.ll with -match-full-lines -strict-whitespace
This prevents breaking the indentation that shows the structure of the
pass managers.

Differential Revision: https://reviews.llvm.org/D105891
2021-07-14 16:33:50 +01:00
Alexey Bataev
328393368a [SLP]Workaround for InsertSubVector cost.
The cost of the InsertSubvector shuffle kind cost is not complete and
may end up with just extracts + inserts costs in many cases. Added
a workaround to represent it as a generic PermuteSingleSrc, which is
still pessimistic but better than InsertSubvector.

Differential Revision: https://reviews.llvm.org/D105827
2021-07-14 07:54:24 -07:00
Jinsong Ji
8853fbe08a [AIX] Enable dollar sign as PC in inlineasm
$ is used as PC for PowerPC inlineasm, ELF use it,
enable it for AIX XCOFF as well.

Reviewed By: #powerpc, amyk, nemanjai

Differential Revision: https://reviews.llvm.org/D105956
2021-07-14 13:37:52 +00:00
oToToT
9fc72d07d0 [docs] Update CMake cross compiling guide link
The CMake community Wiki has been moved to the [[ https://gitlab.kitware.com/cmake/community/wikis/home | Kitware GitLab Instance ]].
Also, the original anchor for `Information how to set up various cross compiling toolchains` section might not work as expected. The original content is now being collapsed, so browser won't navigate to the right section directly.

Hence, I think it might be better to provide the section name instead of `this section` with link to help readers find the right section by themselves.

Reviewed By: void

Differential Revision: https://reviews.llvm.org/D104996
2021-07-14 21:19:42 +08:00
Tim Northover
daf4db0c98 ARM: reuse existing libcall global variable if possible.
If we try to create a new GlobalVariable on each iteration, the Module will
detect the name collision and "helpfully" rename later iterations by appending
".1" etc. But "___udivsi3.1" doesn't exist and we definitely don't want to try
to call it.

So instead check whether there's already a global with the right name in the
module and use that if so.
2021-07-14 14:14:47 +01:00
Sanjay Patel
df0b8763a2 [SLP] match logical and/or as reduction candidates
This has been a work-in-progress for a long time...we finally have all of
the pieces in place to handle vectorization of compare code as shown in:
https://llvm.org/PR41312

To do this (see PhaseOrdering tests), we converted SimplifyCFG and
InstCombine to the poison-safe (select) forms of the logic ops, so now we
need to have SLP recognize those patterns and insert a freeze op to make
a safe reduction:
https://alive2.llvm.org/ce/z/NH54Ah

We get the minimal patterns with this patch, but the PhaseOrdering tests
show that we still need adjustments to get the ideal IR in some or all of
the motivating cases.

Differential Revision: https://reviews.llvm.org/D105730
2021-07-14 09:02:31 -04:00
Djordje Todorovic
c793732c01 [RemoveRedundantDebugValues] Add a Pass that removes redundant DBG_VALUEs
This new MIR pass removes redundant DBG_VALUEs.

After the register allocator is done, more precisely, after
the Virtual Register Rewriter, we end up having duplicated
DBG_VALUEs, since some virtual registers are being rewritten
into the same physical register as some of existing DBG_VALUEs.
Each DBG_VALUE should indicate (at least before the LiveDebugValues)
variables assignment, but it is being clobbered for function
parameters during the SelectionDAG since it generates new DBG_VALUEs
after COPY instructions, even though the parameter has no assignment.
For example, if we had a DBG_VALUE $regX as an entry debug value
representing the parameter, and a COPY and after the COPY,
DBG_VALUE $virt_reg, and after the virtregrewrite the $virt_reg gets
rewritten into $regX, we'd end up having redundant DBG_VALUE.

This breaks the definition of the DBG_VALUE since some analysis passes
might be built on top of that premise..., and this patch tries to fix
the MIR with the respect to that.

This first patch performs bacward scan, by trying to detect a sequence of
consecutive DBG_VALUEs, and to remove all DBG_VALUEs describing one
variable but the last one:

For example:

(1) DBG_VALUE $edi, !"var1", ...
(2) DBG_VALUE $esi, !"var2", ...
(3) DBG_VALUE $edi, !"var1", ...
 ...

in this case, we can remove (1).

By combining the forward scan that will be introduced in the next patch
(from this stack), by inspecting the statistics, the RemoveRedundantDebugValues
removes 15032 instructions by using gdb-7.11 as a testbed.

Differential Revision: https://reviews.llvm.org/D105279
2021-07-14 04:29:42 -07:00
Simon Pilgrim
e0937686e0 [InstCombine] Fold (select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0)) (PR50183) (REAPPLIED)
As discussed on PR50183, we already fold to prefer 'select-of-idx' vs 'select-of-gep':

define <4 x i32>* @select0a(<4 x i32>* %a0, i64 %a1, i1 %a2, i64 %a3) {
  %gep0 = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %a1
  %gep1 = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %a3
  %sel = select i1 %a2, <4 x i32>* %gep0, <4 x i32>* %gep1
  ret <4 x i32>* %sel
}
-->
define <4 x i32>* @select1a(<4 x i32>* %a0, i64 %a1, i1 %a2, i64 %a3) {
  %sel = select i1 %a2, i64 %a1, i64 %a3
  %gep = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %sel
  ret <4 x i32>* %gep
}

This patch adds basic handling for the 'fallthrough' cases where the gep idx == 0 has been folded away to the base address:

define <4 x i32>* @select0(<4 x i32>* %a0, i64 %a1, i1 %a2) {
  %gep = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %a1
  %sel = select i1 %a2, <4 x i32>* %a0, <4 x i32>* %gep
  ret <4 x i32>* %sel
}
-->
define <4 x i32>* @select1(<4 x i32>* %a0, i64 %a1, i1 %a2) {
  %sel = select i1 %a2, i64 0, i64 %a1
  %gep = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %sel
  ret <4 x i32>* %gep
}

Reapplied with a fix for the bpf "-bpf-disable-avoid-speculation" tests

Differential Revision: https://reviews.llvm.org/D105901
2021-07-14 12:21:01 +01:00
Chuanqi Xu
6ee3360389 [NFC] [Coroutines] Remove unused CoroFree 2021-07-14 19:13:12 +08:00
Simon Pilgrim
32c30ddee7 [X86] Implement smarter instruction lowering for FP_TO_UINT from f32/f64 to i32/i64 and vXf32/vXf64 to vXi32 for SSE2 and AVX2 by using the exact semantic of the CVTTPS2SI instruction.
We know that "CVTTPS2SI" returns 0x80000000 for out of range inputs (and for FP_TO_UINT, negative float values are undefined). We can use this to make unsigned conversions from vXf32 to vXi32 more efficient, particularly on targets without blend using the following logic:

small := CVTTPS2SI(x);
fp_to_ui(x) := small | (CVTTPS2SI(x - 2^31) & ARITHMETIC_RIGHT_SHIFT(small, 31))

Even on targets where "PBLENDVPS"/"PBLENDVB" exists, it is often a latency 2, low throughput instruction so this logic is applied there too (in particular for AVX2 also). It furthermore gets rid of one high latency floating point comparison in the previous lowering.

@TomHender checked the correctness of this for all possible floats between -1 and 2^32 (both ends excluded).

Original Patch by @TomHender (Tom Hender)

Differential Revision: https://reviews.llvm.org/D89697
2021-07-14 12:03:49 +01:00
LLVM GN Syncbot
b00ffab852 [gn build] Port c08dabb0f476 2021-07-14 10:49:08 +00:00
Simon Pilgrim
9be15018bd Revert rGb803294cf78714303db2d3647291a2308347ef23 : "[InstCombine] Fold (select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0)) (PR50183)"
Missed some BPF test changes that need addressing
2021-07-14 11:48:37 +01:00
Nico Weber
0eb0133948 [gn build] (manually) merge 462d4de35b0c 2021-07-14 06:43:23 -04:00
Stefan Pintilie
d3442926d3 [NFC][PowerPC] Added test to check regsiter allocation for ACC registers
ACC regsiters are a combination of 4 consecutive vector regsiters and therefore
somtimes require special treatment for register allocation. This patch only
adds a test.
2021-07-14 05:22:24 -05:00
Stephen Tozer
6422696166 [DebugInfo] Correctly update dbg.values with duplicated location ops
This patch fixes code that incorrectly handled dbg.values with duplicate
location operands, i.e. !DIArgList(i32 %a, i32 %a). The errors in
question were caused by either applying an update to dbg.value multiple
times when the update is only valid once, or by updating the
DIExpression for only the first instance of a value that appears
multiple times.

Differential Revision: https://reviews.llvm.org/D105831
2021-07-14 11:17:24 +01:00
Simon Pilgrim
417d5c9876 [InstCombine] Fold (select C, (gep Ptr, Idx), Ptr) -> (gep Ptr, (select C, Idx, 0)) (PR50183)
As discussed on PR50183, we already fold to prefer 'select-of-idx' vs 'select-of-gep':

define <4 x i32>* @select0a(<4 x i32>* %a0, i64 %a1, i1 %a2, i64 %a3) {
  %gep0 = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %a1
  %gep1 = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %a3
  %sel = select i1 %a2, <4 x i32>* %gep0, <4 x i32>* %gep1
  ret <4 x i32>* %sel
}
-->
define <4 x i32>* @select1a(<4 x i32>* %a0, i64 %a1, i1 %a2, i64 %a3) {
  %sel = select i1 %a2, i64 %a1, i64 %a3
  %gep = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %sel
  ret <4 x i32>* %gep
}

This patch adds basic handling for the 'fallthrough' cases where the gep idx == 0 has been folded away to the base address:

define <4 x i32>* @select0(<4 x i32>* %a0, i64 %a1, i1 %a2) {
  %gep = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %a1
  %sel = select i1 %a2, <4 x i32>* %a0, <4 x i32>* %gep
  ret <4 x i32>* %sel
}
-->
define <4 x i32>* @select1(<4 x i32>* %a0, i64 %a1, i1 %a2) {
  %sel = select i1 %a2, i64 0, i64 %a1
  %gep = getelementptr inbounds <4 x i32>, <4 x i32>* %a0, i64 %sel
  ret <4 x i32>* %gep
}

Differential Revision: https://reviews.llvm.org/D105901
2021-07-14 10:49:12 +01:00