1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-02-01 05:01:59 +01:00

191357 Commits

Author SHA1 Message Date
David Green
96b3c41b3d [ARM] Add extra use test for MVE VPT blocks. NFC 2020-02-05 18:32:18 +00:00
Matt Arsenault
dffd3d8c49 GlobalISel: Assume G_INTRINSIC* are convergent
This is safer in case anyone tries to run MI optimization passes on
pre-selected MIR. If there turns out to be a real reason to do this,
we might need to add separate convergent intrinsic opcodes.
2020-02-05 10:17:22 -08:00
LLVM GN Syncbot
8c3a57058a [gn build] Port fc62b36a000 2020-02-05 18:06:25 +00:00
Nick Desaulniers
686a91a786 [llvm-reduce] add ReduceAttribute delta pass
Summary:
The output from llvm-reduce still has significantly more attributes than
bugpoint does.  Teach llvm-reduce to remove attributes.

Reviewers: diegotf, dblaikie, george.burgess.iv

Subscribers: mgorny, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73853
2020-02-05 10:05:25 -08:00
Jessica Paquette
fe54d96576 [AArch64][GlobalISel] Fold G_ASHR into TB(N)Z bit calculation
This implements walking over G_ASHR in the same way as `getTestBitOperand` in
AArch64ISelLowering.

```
(tbz (ashr x, c), b) -> (tbz x, b+c) or (tbz x, msb) if b+c is > # bits in x
```

Differential Revision: https://reviews.llvm.org/D73933
2020-02-05 10:04:48 -08:00
Christopher Tetreault
c5dc9d3b15 Reapply: [SVE] Fix bug in simplification of scalable vector instructions
This reverts commit a05441038a3a4a011b9421751367c5c797d57137, reapplying
commit 31574d38ac5fa4646cf01dd252a23e682402134f
2020-02-05 10:00:09 -08:00
Petr Hosek
0e71bf0b60 [CMake] Filter libc++abi and libunwind from runtimes build in MSVC
These don't build on MSVC at the moment, so filter these out altogether
from the list of runtimes and print a warning.

Differential Revision: https://reviews.llvm.org/D73812
2020-02-05 09:59:06 -08:00
Matt Arsenault
df0d66718e AMDGPU/GlobalISel: Prefer merge/unmerge ops to legalize TFE
These have a better chance of combining with other operations and are
currently much better supported than G_EXTRACT.
2020-02-05 12:56:10 -05:00
Jessica Paquette
bdc86b5318 [AArch64][GlobalISel] Fix one use check in getTestBitReg
(1) The check needs to be on the 0th operand of whatever we're folding
(2) Checks for validity should happen before we change the bit

Fixes a bug which caused MultiSource/Applications/JM/lencod to fail at -O3.

Differential Revision: https://reviews.llvm.org/D74002
2020-02-05 09:54:52 -08:00
Matt Arsenault
657b413bfa AMDGPU/GlobalISel: Legalize TFE image result loads
Rewrite the result register pair into the expected sinigle register
format in the legalizer.

I'm also operating under the assumption that TFE doesn't apply to
stores or atomics, but don't know if this is true or not.
2020-02-05 12:40:20 -05:00
Hiroshi Yamauchi
fcd3bf40ff [PGO][PGSO] Tune flags for profile guided size optimization.
Summary:
Tune the profile threshold flag value for instrumentation PGO based on internal
benchmarks.

Also, add flags to allow profile guided size optimizations for non-cold code
to be enabled separately for instrumentation and sample PGSO.

Neither changes the default behavior (yet) as it's disabled for non-cold code.

Reviewers: davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72937
2020-02-05 09:37:32 -08:00
Matt Arsenault
0861faad9c AMDGPU: Fix divergence analysis of control flow intrinsics
The mask results of these should be uniform. The trickier part is the
dummy booleans used as IR glue need to be treated as divergent. This
should make the divergence analysis results correct for the IR the DAG
is constructed from.

This should allow us to eliminate requiresUniformRegister, which has
an expensive, recursive scan over all users looking for control flow
intrinsics. This should avoid recent compile time regressions.
2020-02-05 09:30:54 -08:00
Jordan Rupprecht
88aab300b4 NFC: fix unused var warnings in no-assert builds 2020-02-05 09:26:59 -08:00
Kazu Hirata
d766c964c9 Resubmit^2: [JumpThreading] Thread jumps through two basic blocks
This reverts commit 41784bed01543315a1d03141e6ddc023fd914c0b.

Since the original revision ead815924e6ebeaf02c31c37ebf7a560b5fdf67b,
this revision fixes three issues:

- This revision fixes the Windows build.  My original patch improperly
  copied EH pads on Windows.  This patch disregards jump threading
  opportunities having to do with EH pads.

- This revision fixes jump threading to a wrong destination.
  Specifically, my original patch treated any Constant other than 0 as 1
  while evaluating the branch condition.  This bug led to treating
  constant expressions like:

    icmp ugt i8* null, inttoptr (i64 4 to i8*)

  to "true".  This patch fixes the bug by calling isOneValue.

- This revision fixes the cost calculation of two basic blocks being
  threaded through.  Note that getJumpThreadDuplicationCost returns
  "(unsigned)~0" for those basic blocks that cannot be duplicated.  If
  we sum of two return values from getJumpThreadDuplicationCost, we
  could have an unsigned overflow like:

    (unsigned)~0 + 5 = 4

  and mistakenly determine that it's safe and profitable to proceed
  with the jump threading opportunity.  The patch fixes the bug by
  checking each return value before summing them up.

[JumpThreading] Thread jumps through two basic blocks

Summary:
This patch teaches JumpThreading.cpp to thread through two basic
blocks like:

  bb3:
    %var = phi i32* [ null, %bb1 ], [ @a, %bb2 ]
    %tobool = icmp eq i32 %cond, 0
    br i1 %tobool, label %bb4, label ...

  bb4:
    %cmp = icmp eq i32* %var, null
    br i1 %cmp, label bb5, label bb6

by duplicating basic blocks like bb3 above.  Once we duplicate bb3 as
bb3.dup and redirect edge bb2->bb3 to bb2->bb3.dup, we have:

  bb3:
    %var = phi i32* [ @a, %bb2 ]
    %tobool = icmp eq i32 %cond, 0
    br i1 %tobool, label %bb4, label ...

  bb3.dup:
    %var = phi i32* [ null, %bb1 ]
    %tobool = icmp eq i32 %cond, 0
    br i1 %tobool, label %bb4, label ...

  bb4:
    %cmp = icmp eq i32* %var, null
    br i1 %cmp, label bb5, label bb6

Then the existing code in JumpThreading.cpp can thread edge
bb3.dup->bb4 through bb4 and eventually create bb3.dup->bb5.

Reviewers: wmi

Subscribers: hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70247
2020-02-05 09:23:37 -08:00
Alina Sbirlea
88ae466cc3 [IRCE] Make IRCE a Function pass.
Summary: Make InductiveRangeCheckElimination a FunctionPass.

Reviewers: reames, mkazantsev

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73592
2020-02-05 09:22:41 -08:00
LLVM GN Syncbot
9d09a8e119 [gn build] Port b198f16e1e1 2020-02-05 17:03:12 +00:00
Matt Arsenault
2e7059e936 AMDGPU/GlobalISel: Legalize llvm.amdgcn.s.buffer.load
The 96-bit results need to be widened.

I find the interaction between LegalizerHelper and MIRBuilder somewhat
awkward. The custom legalization is called by the LegalizerHelper, but
then does not have access to the helper. You have to construct a new
helper, which then does not own the MachineIRBuilder, but does modify
it. Maybe custom legalization should be passed the helper?
2020-02-05 12:01:34 -05:00
Teresa Johnson
addd1606be [WPD/LowerTypeTests] Delay lowering/removal of type tests until after ICP
Summary:
Currently type test assume sequences inserted for devirtualization are
removed during WPD. This patch delays their removal until later in the
optimization pipeline. This is an enabler for upcoming enhancements to
indirect call promotion, for example streamlined promotion guard
sequences that compare against vtable address instead of the target
function, when there are small number of possible vtables (either
determined via WPD or by in-progress type profiling). We need the type
tests to correlate the callsites with the address point offset needed in
the compare sequence, and optionally to associated type summary info
computed during WPD.

This depends on work in D71913 to enable invocation of LowerTypeTests to
drop type test assume sequences, which will now be invoked following ICP
in the ThinLTO post-LTO link pipelines, and also after the existing
export phase LowerTypeTests invocation in regular LTO (which is already
after ICP). We cannot simply move the existing import phase
LowerTypeTests pass later in the ThinLTO post link pipelines, as the
comment in PassBuilder.cpp notes (it must run early because when
performing CFI other passes may disturb the sequences it looks for).

This necessitated adding a new type test resolution "Unknown" that we
can use on the type test assume sequences previously removed by WPD,
that we now want LTT to ignore.

Depends on D71913.

Reviewers: pcc, evgeny777

Subscribers: mehdi_amini, Prazek, hiraditya, steven_wu, dexonsmith, arphaman, davidxl, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D73242
2020-02-05 08:59:48 -08:00
Matt Arsenault
b46a1225fc AMDGPU/GlobalISel: Fix processing new phi in waterfall loop
The adjusted iterator range included the last we just inserted, and
don't want to process. Figure out the new iterator range before
inserting phis. This was a harmless problem, but added an unnecessary
complication for a future patch.
2020-02-05 11:52:42 -05:00
Matt Arsenault
1da609be66 GlobalISel: Make LegalizerHelper primitives public
I want to re-use widenScalarDst/moreElementsVectorDst directly.
2020-02-05 11:52:18 -05:00
Matt Arsenault
4ba6a5e05e AMDGPU/GlobalISel: Don't use legal v2s16 G_BUILD_VECTOR
If we have s_pack_* instructions, legalize this to
G_BUILD_VECTOR_TRUNC from s32 elements. This is closer to how how the
s_pack_* instructions really behave.

If we don't have s_pack_ instructions, expand this by creating a merge
to s32 and bitcasting. This expands to the expected bit operations. I
think this eventually should go in a new bitcast legalize action type
in LegalizerHelper.

We already directly emit the shift operations in RegBankSelect for the
vector case. This could possibly be cleaned up, but I also may want to
defer doing this expansion to selection anyway. I'll see about that
when I try to actually match VOP3P instructions.

This breaks the selection of the build_vector since tablegen doesn't
know how to match G_BUILD_VECTOR_TRUNC yet, so just xfail it for now.
2020-02-05 11:52:18 -05:00
Momchil Velikov
ad184987b8 [ARM][TargetParser] Improve handling of dependencies between target features
The patch at https://reviews.llvm.org/D64048 added "negative"
dependency handling in `ARM::appendArchExtFeatures`: feature "noX"
removes all features, which imply "X".

This patch adds the "positive" handling: feature "X" adds all the
feature strings implied by "X".

(This patch also comes from the suggestion here
https://reviews.llvm.org/D72633#inline-658582)

Differential Revision: https://reviews.llvm.org/D72762
2020-02-05 16:07:51 +00:00
Alex Richardson
16edf004ac Re-enable a update_cc_test_checks.py tests
This test was not running because it still had a REQUIRES: python3 line.
As this is no longer necessary, remove the REQUIRES to run the test
again.
2020-02-05 15:37:30 +00:00
Sjoerd Meijer
9711abe6d3 [ARM][MVE] LowOverheadLoops: DCE on the iteration count setup expression
Once we have created a tail-predicated hardware-loop, and thus know the number
of elements that are processed, we want to clean-up the iteration count
expression of that loop. In D73682, we bailed the analysis on conditionally
executed instructions. This adds support for IT-blocks, so that we can handle
these cases again. The restriction is that we only support IT blocks containing
1 statement, but that seems to cover most cases and forms of the iteration
count expression.

Differential Revision: https://reviews.llvm.org/D73947
2020-02-05 15:15:46 +00:00
Momchil Velikov
38241fc5d7 [ARM] Correct syntax of the CLRM insn
The predicate should be adjacent to the opcode.

Differential Revision: https://reviews.llvm.org/D74040
2020-02-05 13:54:34 +00:00
Andrea Di Biagio
6d0e7e12a0 [MCA] Remove verification check on MayLoad and MayStore. NFCI
Field NumMicroOpcodes is currently used by mca to model the number of uOPs
dispatched from the uOp-Queue to the out of order backend.  From a 'dispatch'
point of view, an instruction with zero opcodes is still valid; it simply
doesn't consume any dispatch group slots.

However, mca doesn't expect an instruction with zero uOPs to consume pipeline
resources because it is seen as a contradiction.  In practice, it only makes
sense if such an instruction is eliminated and never really executed. It may be
that mca is being too conservative here. However I believe that mca is right,
and we should probably check that inconsistency in CodeGenSchedule.cpp (when we
also verify scheduling classes in general).

This patch removes the check for MayLoad and MayStore in mca.  That check is
probably too conservative: we are already checking if a zero-uops instruction
consumes any processor resources. Note also that instructions with unmodelled
side-effects also tend to set the MayLoad/MayStore flags even if - theoretically
speaking - they might not even consume any hw resources in practice.

In future we may want to implement different checks (possibly outside of mca)
and potentially revisit the logic in mca that verifies instructions.
For that reason I have raised PR44797.
2020-02-05 13:50:01 +00:00
Simon Pilgrim
29c63c6731 visitINSERT_VECTOR_ELT - pull out repeated dyn_cast. NFCI.
This always gets called at least once.
2020-02-05 13:30:54 +00:00
Sam Parker
3b7b1e369e [ARM][LowOverheadLoops] Fix loop count chain
Checking that the use-def chain that performs the loop count
isSafeToRemove is not sufficient because it means that we can
remove register copies that we need to restore lr to its correct
value. This change now prevents the transform from kicking in for the
'remove-elem-moves' test which needs to addressed later on.

Differential Revision: https://reviews.llvm.org/D74037
2020-02-05 13:21:51 +00:00
Sam Parker
4a2128748f [ARM][LowOverheadLoops] Ensure memory predication
While validating each MVE instruction, check that all instructions
that touch memory are somehow predicated upon the VCTP.

Differential Revision: https://reviews.llvm.org/D73616
2020-02-05 13:19:08 +00:00
Simon Pilgrim
b6af7e47c1 [X86] Fix missing load latencies (PR36894)
We weren't account for load latencies in the SSE42/AES/CLMUL schedule classes
2020-02-05 11:53:16 +00:00
Ayke van Laethem
0b73574e68 [AVR] Add disassembly tests for supported instructions
The disassembler of the AVR backend is incomplete: most instructions do
not correctly disassemble yet.

This patch is the first in a series to add disassembly support to the
AVR backend. It starts with adding disassembler tests for instructions
that already disassemble correctly.

Differential Revision: https://reviews.llvm.org/D73911
2020-02-05 12:38:51 +01:00
Martin Storsjö
73cf6c4a89 Partially revert c1c9819ef91aab51b5a23fb3027adac5a2f551cc
Revert the part of that change that broke the
test Passes/./PluginsTests/PluginsTests.LoadPlugin.
2020-02-05 13:29:48 +02:00
Martin Storsjö
d6b7a01674 [CMake] Add missing component dependencies, to fix building for mingw with BUILD_SHARED_LIBS
Differential Revision: https://reviews.llvm.org/D73840
2020-02-05 13:10:27 +02:00
Sebastian Neubauer
097cac0b5f [AMDGPU] Fix lowering a16 image intrinsics
scalar_to_vector takes only one argument, not two.
The a16 tests now also check the packing of coordinates into registers

Differential Revision: https://reviews.llvm.org/D73482
2020-02-05 10:54:34 +01:00
Sebastian Neubauer
c9e54cbc12 [AMDGPU] Use v3f32 type in image instructions
This should lower the amount of used registers for gfx9.

I updated some of the changed tests with the update script because
changing them by hand is tedious.

Differential Revision: https://reviews.llvm.org/D73884
2020-02-05 10:35:41 +01:00
Georgii Rymar
38ad9710a6 [yaml2obj][obj2yaml] - Simplify format of the SHT_LLVM_ADDRSIG section.
Previously the description allowed to describe symbols with use of
`Name` and `Index` keys. This patch removes them and now it is still
possible to use either names or symbol indexes, but the code is simpler
and the format is slightly different.

Such a change will be useful for another patches, e.g:
https://reviews.llvm.org/D73788#inline-671077

Differential revision: https://reviews.llvm.org/D73888
2020-02-05 12:33:14 +03:00
Djordje Todorovic
73925edae1 [DebugInfo] Avoid the call site param for mem instrs with multiple defs
We currently only handle mem instructions with a single define.
Avoid the call site parameter debug info when we find the case with
multiple defs, rather than throwing an assert.

Differential Revision: https://reviews.llvm.org/D73954
2020-02-05 10:03:14 +01:00
Craig Topper
145a539e76 [X86] Add a DAG combine for (i32 (sext (i8 (x86isd::setcc_carry)))) -> (i32 (x86isd::setcc_carry)) and remove isel patterns.
Same for any_extend though we don't have coverage for that.

The test changes are because isel didn't check one use of the
setcc_carry. So in isel we would end up with two different
sized setcc_carry instructions. And since it clobbers
the flags we would need to recreate the flags for the second
instruction.

This code handles additional uses by truncating the new wide
setcc_carry back to the original size for those uses.
2020-02-04 22:40:36 -08:00
Petr Hosek
ff60cb9276 [CMake] Passthrough CMAKE_SYSTEM_NAME to default builtin and runtimes target
When building the default builtin and runtimes target, set the
CMAKE_SYSTEM_NAME to the current one. This is not necessary on
Linux and Darwin, but it appears to be necessary on Windows,
otherwise CMake fails.

Differential Revision: https://reviews.llvm.org/D73811
2020-02-04 22:38:20 -08:00
Jan Vesely
d881d8ad71 AMDGPU/EG,CM: Implement fsqrt using recip(rsqrt(x)) instead of x * rsqrt(x)
The old version might be faster on EG (RECIP_IEEE is Trans only),
but it'd need extra corner case checks.
This gives correct corner case behaviour and saves a register.
Fixes OCL CTS sqrt test (1-thread, scalar) on Turks.

Reviewer: arsenm
Differential Revision: https://reviews.llvm.org/D74017
2020-02-05 00:24:07 -05:00
Thomas Lively
352e779087 Revert "[WebAssembly][InstrEmitter] Foundation for multivalue call lowering"
Summary:
This reverts commit 3ef169e586f4d14efe690c23c878d5aa92a80eb5. The
purpose of this commit was to allow stack machines to perform
instruction selection for instructions with variadic defs. However,
MachineInstrs fundamentally cannot support variadic defs right now, so
this change does not turn out to be useful.

Depends on D73927.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73928
2020-02-04 20:04:59 -08:00
Matt Arsenault
bce6c219c5 AMDGPU: Correct memory size for image intrinsics
This was incorrectly rounding up to the next power of 2. v4f32 was
rounding up to v8f32, which was just wrong. There are also v3i16/v3f16
available in MVT, so we don't even need to round the f16 cases
anymore. Additionally, this field is really an EVT so we don't even
need to consider this.

Also switch some asserts to return invalid. We should have an IR
verifier for these intrinsic return types, but for now it's better to
not assert on IR that passes the verifier.

This should also probably be fixed to consider that dmask is really
eliminating some of the loaded components.
2020-02-04 22:29:23 -05:00
David Blaikie
754b9ad300 DebugInfo: Hash DW_OP_convert in loclists when using Split DWARF
Originally committed in: 1ced28cbe75ff81f35ac2c71e941041eb3afcd00
            Reverted in: f75301d16d444d8cb6810d679290df744bc79ec7

(reverted due to tests failing on non-linux/x86 targets, tests have since been
generalized and specialized... since Split DWARF isn't supported on non-elf
targets anyway and we have no way to run on "whatever elf target is available"
so they fail on MacOS without an explicit target triple)

This code was incorrectly emitting extra bytes into arbitrary parts of
the object file when it was meant to be hashing them to compute the DWO
ID.

Follow-up patch(es) will refactor this API somewhat to make such bugs
harder to introduce, hopefully.
2020-02-04 19:25:47 -08:00
David Blaikie
e8f41bc32c DebugInfo: Add a couple of missing COFF sections to make convert-loclist.ll pass on Windows 2020-02-04 19:23:57 -08:00
David Blaikie
ff32271704 DebugInfo: convert-debugloc.ll generalize to run on ppc64le
This target produces a location list for the location, so split the
match between lines to allow for a location list match.
2020-02-04 19:14:22 -08:00
David Blaikie
ef2854822b DebugInfo: Fix convert-loclist.ll Split DWARF variant to use a hardcoded triple
Since we don't support Split DWARF emission on non-ELF formats, hardcode
an elfine triple (we don't have a way to ask for "any ELF triple" it
seems, so hardcoded will have to do)
2020-02-04 19:02:11 -08:00
Thomas Lively
fb5e7af6ce Revert "[WebAssembly] Split and recombine multivalue calls for ISel"
Summary:
This reverts commit 28857d14a86b1e99a9d2795636a5faf17674f5a2. This
commit worked toward a solution that did not turn out to be feasible
because MachineInstrs cannot contain an arbitrary number of defs.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73927
2020-02-04 18:46:43 -08:00
Yonghong Song
a635edbb9b [BPF] disable ReduceLoadWidth during SelectionDag phase
The compiler may transform the following code
  ctx = ctx + reloc_offset
  ... (*(u32 *)ctx) & 0x8000 ...
to
  ctx = ctx + reloc_offset
  ... (*(u8 *)(ctx + 1)) & 0x80 ...
where reloc_offset will be replaced with a constant during
AsmPrinter phase.

The above transformed code will be rejected the kernel verifier
as it does not allow
  *(type *)((ctx + non_zero_offset1) + non_zero_offset2)
style access pattern.

It is hard at SelectionDag phase to identify whether a load
is related to context or not. Sometime, interprocedure analysis
may be needed. So let us simply prevent such optimization
from happening.

Differential Revision: https://reviews.llvm.org/D73997
2020-02-04 18:37:43 -08:00
David Blaikie
839fcb753d Recommit: DebugInfo: Check DW_OP_convert in loclists with Split DWARF
Originally committed in: 552a8fe12bd1822f48dda2e9e8728a179f82d356
            Reverted in: f75301d16d444d8cb6810d679290df744bc79ec7

Reverted because it was running llc directly (rather than %llc_dwarf)
which uses COFF files on Windows which LLVM doesn't support all DWARF
features in.

This functionality isn't fully working, but sets up the testing for a
follow-on patch that demonstrates and fixes the brokenness related to
DWO ID hashing this construct.
2020-02-04 18:37:06 -08:00
Thomas Lively
50202c53b2 [WebAssembly] Enable recently implemented SIMD operations
Summary:
Moves a batch of instructions from unimplemented-simd128 to simd128
because they have recently become available in V8.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D73926
2020-02-04 18:36:32 -08:00