1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-19 02:52:53 +02:00
Commit Graph

191565 Commits

Author SHA1 Message Date
Victor Huang
bc814804ef [PowerPC][NFC] Clang-format on commit 4b414d 2020-02-05 13:47:54 -06:00
Adrian McCarthy
3ec1d2b411 [VFS] More consistent support for Windows
Removed some #ifdefs specific to Windows handling of VFS paths.  This
eliminates most of the differences between the Windows and non-Windows
code paths.

Making this work required some changes to account for the fact that VFS
file paths can be Posix style or Windows style, so you cannot just assume
that they use the host's native path style.  In one case, this means
implementing our own version of make_absolute, since the filesystem code
in Support doesn't have styles in the sense that the path code does.

Differential Review: https://reviews.llvm.org/D71092
2020-02-05 11:38:20 -08:00
Matt Arsenault
45fe1cf69f AMDGPU/GlobalISel: Legalize f64 G_FFLOOR for SI
Use cmp ord instead of cmp_class compared to the DAG version for the
nan check, but mostly try to match the existsing pattern.

I think the sign doesn't matter for fract, so we could do a little
better with the source modifier matching.

I think this is also still broken as in D22898, but I'm leaving it
as-is for now while I don't have an SI system to test on.
2020-02-05 14:32:01 -05:00
Nate Voorhies
35f8a76c2c [NFC][RISCV] Fixing typo in comment.
Reviewers: luismarques, lenary

Reviewed By: lenary

Subscribers: hiraditya, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73984
2020-02-05 11:30:11 -08:00
LLVM GN Syncbot
9d6725a8d3 [gn build] Port b12176d2aaf 2020-02-05 19:16:15 +00:00
Nico Weber
878db5393d Revert "[llvm-reduce] add ReduceAttribute delta pass"
This reverts commit fc62b36a000681c01e993242b583c5ec4ab48a3c.
Breaks tests on mac: http://45.33.8.238/mac/7301/step_11.txt
2020-02-05 14:15:11 -05:00
Jan Korous
4a2256fd57 Revert "Activate extension loading test on Darwin now that the underlying fix has landed"
This reverts commit 058070893428a480b76a137f647ae6b9c89ac2d4.
2020-02-05 11:04:38 -08:00
Shu-Chun Weng
7269103002 [GlobalISel][AArch64] Fix contract cross-bank copies with SIMD instructions
contractCrossBankCopyIntoStore() finds the instruction defines the
source register and uses its output to replace the register. There are,
however, instructions that have multiple outputs, e.g. G_UNMERGE_VALUES.
Current implementation hardcodes to operand 0 and has no way of knowing
which output should be used.

This change adds another function to directly return the register that
is the source of the register and use that for folding.

This fixes https://bugs.llvm.org/show_bug.cgi?id=44783

Differential Revision: https://reviews.llvm.org/D74005
2020-02-05 10:38:35 -08:00
David Green
96b3c41b3d [ARM] Add extra use test for MVE VPT blocks. NFC 2020-02-05 18:32:18 +00:00
Matt Arsenault
dffd3d8c49 GlobalISel: Assume G_INTRINSIC* are convergent
This is safer in case anyone tries to run MI optimization passes on
pre-selected MIR. If there turns out to be a real reason to do this,
we might need to add separate convergent intrinsic opcodes.
2020-02-05 10:17:22 -08:00
LLVM GN Syncbot
8c3a57058a [gn build] Port fc62b36a000 2020-02-05 18:06:25 +00:00
Nick Desaulniers
686a91a786 [llvm-reduce] add ReduceAttribute delta pass
Summary:
The output from llvm-reduce still has significantly more attributes than
bugpoint does.  Teach llvm-reduce to remove attributes.

Reviewers: diegotf, dblaikie, george.burgess.iv

Subscribers: mgorny, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73853
2020-02-05 10:05:25 -08:00
Jessica Paquette
fe54d96576 [AArch64][GlobalISel] Fold G_ASHR into TB(N)Z bit calculation
This implements walking over G_ASHR in the same way as `getTestBitOperand` in
AArch64ISelLowering.

```
(tbz (ashr x, c), b) -> (tbz x, b+c) or (tbz x, msb) if b+c is > # bits in x
```

Differential Revision: https://reviews.llvm.org/D73933
2020-02-05 10:04:48 -08:00
Christopher Tetreault
c5dc9d3b15 Reapply: [SVE] Fix bug in simplification of scalable vector instructions
This reverts commit a05441038a3a4a011b9421751367c5c797d57137, reapplying
commit 31574d38ac5fa4646cf01dd252a23e682402134f
2020-02-05 10:00:09 -08:00
Petr Hosek
0e71bf0b60 [CMake] Filter libc++abi and libunwind from runtimes build in MSVC
These don't build on MSVC at the moment, so filter these out altogether
from the list of runtimes and print a warning.

Differential Revision: https://reviews.llvm.org/D73812
2020-02-05 09:59:06 -08:00
Matt Arsenault
df0d66718e AMDGPU/GlobalISel: Prefer merge/unmerge ops to legalize TFE
These have a better chance of combining with other operations and are
currently much better supported than G_EXTRACT.
2020-02-05 12:56:10 -05:00
Jessica Paquette
bdc86b5318 [AArch64][GlobalISel] Fix one use check in getTestBitReg
(1) The check needs to be on the 0th operand of whatever we're folding
(2) Checks for validity should happen before we change the bit

Fixes a bug which caused MultiSource/Applications/JM/lencod to fail at -O3.

Differential Revision: https://reviews.llvm.org/D74002
2020-02-05 09:54:52 -08:00
Matt Arsenault
657b413bfa AMDGPU/GlobalISel: Legalize TFE image result loads
Rewrite the result register pair into the expected sinigle register
format in the legalizer.

I'm also operating under the assumption that TFE doesn't apply to
stores or atomics, but don't know if this is true or not.
2020-02-05 12:40:20 -05:00
Hiroshi Yamauchi
fcd3bf40ff [PGO][PGSO] Tune flags for profile guided size optimization.
Summary:
Tune the profile threshold flag value for instrumentation PGO based on internal
benchmarks.

Also, add flags to allow profile guided size optimizations for non-cold code
to be enabled separately for instrumentation and sample PGSO.

Neither changes the default behavior (yet) as it's disabled for non-cold code.

Reviewers: davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D72937
2020-02-05 09:37:32 -08:00
Matt Arsenault
0861faad9c AMDGPU: Fix divergence analysis of control flow intrinsics
The mask results of these should be uniform. The trickier part is the
dummy booleans used as IR glue need to be treated as divergent. This
should make the divergence analysis results correct for the IR the DAG
is constructed from.

This should allow us to eliminate requiresUniformRegister, which has
an expensive, recursive scan over all users looking for control flow
intrinsics. This should avoid recent compile time regressions.
2020-02-05 09:30:54 -08:00
Jordan Rupprecht
88aab300b4 NFC: fix unused var warnings in no-assert builds 2020-02-05 09:26:59 -08:00
Kazu Hirata
d766c964c9 Resubmit^2: [JumpThreading] Thread jumps through two basic blocks
This reverts commit 41784bed01543315a1d03141e6ddc023fd914c0b.

Since the original revision ead815924e6ebeaf02c31c37ebf7a560b5fdf67b,
this revision fixes three issues:

- This revision fixes the Windows build.  My original patch improperly
  copied EH pads on Windows.  This patch disregards jump threading
  opportunities having to do with EH pads.

- This revision fixes jump threading to a wrong destination.
  Specifically, my original patch treated any Constant other than 0 as 1
  while evaluating the branch condition.  This bug led to treating
  constant expressions like:

    icmp ugt i8* null, inttoptr (i64 4 to i8*)

  to "true".  This patch fixes the bug by calling isOneValue.

- This revision fixes the cost calculation of two basic blocks being
  threaded through.  Note that getJumpThreadDuplicationCost returns
  "(unsigned)~0" for those basic blocks that cannot be duplicated.  If
  we sum of two return values from getJumpThreadDuplicationCost, we
  could have an unsigned overflow like:

    (unsigned)~0 + 5 = 4

  and mistakenly determine that it's safe and profitable to proceed
  with the jump threading opportunity.  The patch fixes the bug by
  checking each return value before summing them up.

[JumpThreading] Thread jumps through two basic blocks

Summary:
This patch teaches JumpThreading.cpp to thread through two basic
blocks like:

  bb3:
    %var = phi i32* [ null, %bb1 ], [ @a, %bb2 ]
    %tobool = icmp eq i32 %cond, 0
    br i1 %tobool, label %bb4, label ...

  bb4:
    %cmp = icmp eq i32* %var, null
    br i1 %cmp, label bb5, label bb6

by duplicating basic blocks like bb3 above.  Once we duplicate bb3 as
bb3.dup and redirect edge bb2->bb3 to bb2->bb3.dup, we have:

  bb3:
    %var = phi i32* [ @a, %bb2 ]
    %tobool = icmp eq i32 %cond, 0
    br i1 %tobool, label %bb4, label ...

  bb3.dup:
    %var = phi i32* [ null, %bb1 ]
    %tobool = icmp eq i32 %cond, 0
    br i1 %tobool, label %bb4, label ...

  bb4:
    %cmp = icmp eq i32* %var, null
    br i1 %cmp, label bb5, label bb6

Then the existing code in JumpThreading.cpp can thread edge
bb3.dup->bb4 through bb4 and eventually create bb3.dup->bb5.

Reviewers: wmi

Subscribers: hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70247
2020-02-05 09:23:37 -08:00
Alina Sbirlea
88ae466cc3 [IRCE] Make IRCE a Function pass.
Summary: Make InductiveRangeCheckElimination a FunctionPass.

Reviewers: reames, mkazantsev

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73592
2020-02-05 09:22:41 -08:00
LLVM GN Syncbot
9d09a8e119 [gn build] Port b198f16e1e1 2020-02-05 17:03:12 +00:00
Matt Arsenault
2e7059e936 AMDGPU/GlobalISel: Legalize llvm.amdgcn.s.buffer.load
The 96-bit results need to be widened.

I find the interaction between LegalizerHelper and MIRBuilder somewhat
awkward. The custom legalization is called by the LegalizerHelper, but
then does not have access to the helper. You have to construct a new
helper, which then does not own the MachineIRBuilder, but does modify
it. Maybe custom legalization should be passed the helper?
2020-02-05 12:01:34 -05:00
Teresa Johnson
addd1606be [WPD/LowerTypeTests] Delay lowering/removal of type tests until after ICP
Summary:
Currently type test assume sequences inserted for devirtualization are
removed during WPD. This patch delays their removal until later in the
optimization pipeline. This is an enabler for upcoming enhancements to
indirect call promotion, for example streamlined promotion guard
sequences that compare against vtable address instead of the target
function, when there are small number of possible vtables (either
determined via WPD or by in-progress type profiling). We need the type
tests to correlate the callsites with the address point offset needed in
the compare sequence, and optionally to associated type summary info
computed during WPD.

This depends on work in D71913 to enable invocation of LowerTypeTests to
drop type test assume sequences, which will now be invoked following ICP
in the ThinLTO post-LTO link pipelines, and also after the existing
export phase LowerTypeTests invocation in regular LTO (which is already
after ICP). We cannot simply move the existing import phase
LowerTypeTests pass later in the ThinLTO post link pipelines, as the
comment in PassBuilder.cpp notes (it must run early because when
performing CFI other passes may disturb the sequences it looks for).

This necessitated adding a new type test resolution "Unknown" that we
can use on the type test assume sequences previously removed by WPD,
that we now want LTT to ignore.

Depends on D71913.

Reviewers: pcc, evgeny777

Subscribers: mehdi_amini, Prazek, hiraditya, steven_wu, dexonsmith, arphaman, davidxl, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D73242
2020-02-05 08:59:48 -08:00
Matt Arsenault
b46a1225fc AMDGPU/GlobalISel: Fix processing new phi in waterfall loop
The adjusted iterator range included the last we just inserted, and
don't want to process. Figure out the new iterator range before
inserting phis. This was a harmless problem, but added an unnecessary
complication for a future patch.
2020-02-05 11:52:42 -05:00
Matt Arsenault
1da609be66 GlobalISel: Make LegalizerHelper primitives public
I want to re-use widenScalarDst/moreElementsVectorDst directly.
2020-02-05 11:52:18 -05:00
Matt Arsenault
4ba6a5e05e AMDGPU/GlobalISel: Don't use legal v2s16 G_BUILD_VECTOR
If we have s_pack_* instructions, legalize this to
G_BUILD_VECTOR_TRUNC from s32 elements. This is closer to how how the
s_pack_* instructions really behave.

If we don't have s_pack_ instructions, expand this by creating a merge
to s32 and bitcasting. This expands to the expected bit operations. I
think this eventually should go in a new bitcast legalize action type
in LegalizerHelper.

We already directly emit the shift operations in RegBankSelect for the
vector case. This could possibly be cleaned up, but I also may want to
defer doing this expansion to selection anyway. I'll see about that
when I try to actually match VOP3P instructions.

This breaks the selection of the build_vector since tablegen doesn't
know how to match G_BUILD_VECTOR_TRUNC yet, so just xfail it for now.
2020-02-05 11:52:18 -05:00
Momchil Velikov
ad184987b8 [ARM][TargetParser] Improve handling of dependencies between target features
The patch at https://reviews.llvm.org/D64048 added "negative"
dependency handling in `ARM::appendArchExtFeatures`: feature "noX"
removes all features, which imply "X".

This patch adds the "positive" handling: feature "X" adds all the
feature strings implied by "X".

(This patch also comes from the suggestion here
https://reviews.llvm.org/D72633#inline-658582)

Differential Revision: https://reviews.llvm.org/D72762
2020-02-05 16:07:51 +00:00
Alex Richardson
16edf004ac Re-enable a update_cc_test_checks.py tests
This test was not running because it still had a REQUIRES: python3 line.
As this is no longer necessary, remove the REQUIRES to run the test
again.
2020-02-05 15:37:30 +00:00
Sjoerd Meijer
9711abe6d3 [ARM][MVE] LowOverheadLoops: DCE on the iteration count setup expression
Once we have created a tail-predicated hardware-loop, and thus know the number
of elements that are processed, we want to clean-up the iteration count
expression of that loop. In D73682, we bailed the analysis on conditionally
executed instructions. This adds support for IT-blocks, so that we can handle
these cases again. The restriction is that we only support IT blocks containing
1 statement, but that seems to cover most cases and forms of the iteration
count expression.

Differential Revision: https://reviews.llvm.org/D73947
2020-02-05 15:15:46 +00:00
Momchil Velikov
38241fc5d7 [ARM] Correct syntax of the CLRM insn
The predicate should be adjacent to the opcode.

Differential Revision: https://reviews.llvm.org/D74040
2020-02-05 13:54:34 +00:00
Andrea Di Biagio
6d0e7e12a0 [MCA] Remove verification check on MayLoad and MayStore. NFCI
Field NumMicroOpcodes is currently used by mca to model the number of uOPs
dispatched from the uOp-Queue to the out of order backend.  From a 'dispatch'
point of view, an instruction with zero opcodes is still valid; it simply
doesn't consume any dispatch group slots.

However, mca doesn't expect an instruction with zero uOPs to consume pipeline
resources because it is seen as a contradiction.  In practice, it only makes
sense if such an instruction is eliminated and never really executed. It may be
that mca is being too conservative here. However I believe that mca is right,
and we should probably check that inconsistency in CodeGenSchedule.cpp (when we
also verify scheduling classes in general).

This patch removes the check for MayLoad and MayStore in mca.  That check is
probably too conservative: we are already checking if a zero-uops instruction
consumes any processor resources. Note also that instructions with unmodelled
side-effects also tend to set the MayLoad/MayStore flags even if - theoretically
speaking - they might not even consume any hw resources in practice.

In future we may want to implement different checks (possibly outside of mca)
and potentially revisit the logic in mca that verifies instructions.
For that reason I have raised PR44797.
2020-02-05 13:50:01 +00:00
Simon Pilgrim
29c63c6731 visitINSERT_VECTOR_ELT - pull out repeated dyn_cast. NFCI.
This always gets called at least once.
2020-02-05 13:30:54 +00:00
Sam Parker
3b7b1e369e [ARM][LowOverheadLoops] Fix loop count chain
Checking that the use-def chain that performs the loop count
isSafeToRemove is not sufficient because it means that we can
remove register copies that we need to restore lr to its correct
value. This change now prevents the transform from kicking in for the
'remove-elem-moves' test which needs to addressed later on.

Differential Revision: https://reviews.llvm.org/D74037
2020-02-05 13:21:51 +00:00
Sam Parker
4a2128748f [ARM][LowOverheadLoops] Ensure memory predication
While validating each MVE instruction, check that all instructions
that touch memory are somehow predicated upon the VCTP.

Differential Revision: https://reviews.llvm.org/D73616
2020-02-05 13:19:08 +00:00
Simon Pilgrim
b6af7e47c1 [X86] Fix missing load latencies (PR36894)
We weren't account for load latencies in the SSE42/AES/CLMUL schedule classes
2020-02-05 11:53:16 +00:00
Ayke van Laethem
0b73574e68 [AVR] Add disassembly tests for supported instructions
The disassembler of the AVR backend is incomplete: most instructions do
not correctly disassemble yet.

This patch is the first in a series to add disassembly support to the
AVR backend. It starts with adding disassembler tests for instructions
that already disassemble correctly.

Differential Revision: https://reviews.llvm.org/D73911
2020-02-05 12:38:51 +01:00
Martin Storsjö
73cf6c4a89 Partially revert c1c9819ef91aab51b5a23fb3027adac5a2f551cc
Revert the part of that change that broke the
test Passes/./PluginsTests/PluginsTests.LoadPlugin.
2020-02-05 13:29:48 +02:00
Martin Storsjö
d6b7a01674 [CMake] Add missing component dependencies, to fix building for mingw with BUILD_SHARED_LIBS
Differential Revision: https://reviews.llvm.org/D73840
2020-02-05 13:10:27 +02:00
Sebastian Neubauer
097cac0b5f [AMDGPU] Fix lowering a16 image intrinsics
scalar_to_vector takes only one argument, not two.
The a16 tests now also check the packing of coordinates into registers

Differential Revision: https://reviews.llvm.org/D73482
2020-02-05 10:54:34 +01:00
Sebastian Neubauer
c9e54cbc12 [AMDGPU] Use v3f32 type in image instructions
This should lower the amount of used registers for gfx9.

I updated some of the changed tests with the update script because
changing them by hand is tedious.

Differential Revision: https://reviews.llvm.org/D73884
2020-02-05 10:35:41 +01:00
Georgii Rymar
38ad9710a6 [yaml2obj][obj2yaml] - Simplify format of the SHT_LLVM_ADDRSIG section.
Previously the description allowed to describe symbols with use of
`Name` and `Index` keys. This patch removes them and now it is still
possible to use either names or symbol indexes, but the code is simpler
and the format is slightly different.

Such a change will be useful for another patches, e.g:
https://reviews.llvm.org/D73788#inline-671077

Differential revision: https://reviews.llvm.org/D73888
2020-02-05 12:33:14 +03:00
Djordje Todorovic
73925edae1 [DebugInfo] Avoid the call site param for mem instrs with multiple defs
We currently only handle mem instructions with a single define.
Avoid the call site parameter debug info when we find the case with
multiple defs, rather than throwing an assert.

Differential Revision: https://reviews.llvm.org/D73954
2020-02-05 10:03:14 +01:00
Craig Topper
145a539e76 [X86] Add a DAG combine for (i32 (sext (i8 (x86isd::setcc_carry)))) -> (i32 (x86isd::setcc_carry)) and remove isel patterns.
Same for any_extend though we don't have coverage for that.

The test changes are because isel didn't check one use of the
setcc_carry. So in isel we would end up with two different
sized setcc_carry instructions. And since it clobbers
the flags we would need to recreate the flags for the second
instruction.

This code handles additional uses by truncating the new wide
setcc_carry back to the original size for those uses.
2020-02-04 22:40:36 -08:00
Petr Hosek
ff60cb9276 [CMake] Passthrough CMAKE_SYSTEM_NAME to default builtin and runtimes target
When building the default builtin and runtimes target, set the
CMAKE_SYSTEM_NAME to the current one. This is not necessary on
Linux and Darwin, but it appears to be necessary on Windows,
otherwise CMake fails.

Differential Revision: https://reviews.llvm.org/D73811
2020-02-04 22:38:20 -08:00
Jan Vesely
d881d8ad71 AMDGPU/EG,CM: Implement fsqrt using recip(rsqrt(x)) instead of x * rsqrt(x)
The old version might be faster on EG (RECIP_IEEE is Trans only),
but it'd need extra corner case checks.
This gives correct corner case behaviour and saves a register.
Fixes OCL CTS sqrt test (1-thread, scalar) on Turks.

Reviewer: arsenm
Differential Revision: https://reviews.llvm.org/D74017
2020-02-05 00:24:07 -05:00
Thomas Lively
352e779087 Revert "[WebAssembly][InstrEmitter] Foundation for multivalue call lowering"
Summary:
This reverts commit 3ef169e586f4d14efe690c23c878d5aa92a80eb5. The
purpose of this commit was to allow stack machines to perform
instruction selection for instructions with variadic defs. However,
MachineInstrs fundamentally cannot support variadic defs right now, so
this change does not turn out to be useful.

Depends on D73927.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73928
2020-02-04 20:04:59 -08:00
Matt Arsenault
bce6c219c5 AMDGPU: Correct memory size for image intrinsics
This was incorrectly rounding up to the next power of 2. v4f32 was
rounding up to v8f32, which was just wrong. There are also v3i16/v3f16
available in MVT, so we don't even need to round the f16 cases
anymore. Additionally, this field is really an EVT so we don't even
need to consider this.

Also switch some asserts to return invalid. We should have an IR
verifier for these intrinsic return types, but for now it's better to
not assert on IR that passes the verifier.

This should also probably be fixed to consider that dmask is really
eliminating some of the loaded components.
2020-02-04 22:29:23 -05:00