1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-22 18:54:02 +01:00
Commit Graph

215401 Commits

Author SHA1 Message Date
Xiang1 Zhang
38ab3a2d9f [X86] Support AMX fast register allocation
Differential Revision: https://reviews.llvm.org/D100026
2021-05-08 14:21:11 +08:00
Arthur Eubanks
d59e5b412e Fix build after 34a8a437b 2021-05-07 23:18:44 -07:00
Xiang1 Zhang
f668ad4ced Revert "[X86] Support AMX fast register allocation"
This reverts commit 77e2e5e07d01fe0b83c39d0c527c0d3d2e659146.
2021-05-08 13:43:32 +08:00
Xiang1 Zhang
fe856bad78 [X86] Support AMX fast register allocation 2021-05-08 13:27:21 +08:00
Michael Liao
6d153ca7f4 Replace a remaining CRLF with LF. NFC. 2021-05-08 01:09:15 -04:00
Arthur Eubanks
b987f39d75 [NewPM] Hide pass manager debug logging behind -debug-pass-manager-verbose
Printing pass manager invocations is fairly verbose and not super
useful.

This allows us to remove DebugLogging from pass managers and PassBuilder
since all logging (aside from analysis managers) goes through
instrumentation now.

This has the downside of never being able to print the top level pass
manager via instrumentation, but that seems like a minor downside.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D101797
2021-05-07 21:51:47 -07:00
RamNalamothu
61d8a36289 [DebugInfo] UnwindTable::create() should not add empty rows to CFI unwind table
UnwindTable::parseRows() may return successfully if the CFIProgram has either
no CFI instructions or only DW_CFA_nop instructions and the UnwindRow return
argument will be empty. But currently, the callers are not checking for this case
which is leading to incorrect dumps in the unwind tables in such cases i.e.

  CFA=unspecified

Reviewed By: clayborg

Differential Revision: https://reviews.llvm.org/D101892
2021-05-08 10:19:02 +05:30
Arthur Eubanks
6acb684b54 [lit] Bump up the Windows process cap from 32 to 60
At 61 or over, I see messages like

  File "...\Python\Python39\lib\multiprocessing\connection.py", line 816, in _exhaustive_wait
    res = _winapi.WaitForMultipleObjects(L, False, timeout)

  ValueError: need at most 63 handles, got a sequence of length 64

60 seems to work for me.

If this causes issues for anybody else, feel free to revert.
2021-05-07 18:13:38 -07:00
Arthur Eubanks
9df931e590 Revert "lit: revert 134b103fc0f3a995d76398bf4b029d72bebe8162"
This reverts commit d319005a3746a7661c8c9a3302266b6ff7cf61be.

Causing messages like:

  File "...\Python\Python39\lib\multiprocessing\connection.py", line 816, in _exhaustive_wait
    res = _winapi.WaitForMultipleObjects(L, False, timeout)
ValueError: need at most 63 handles, got a sequence of length 74
2021-05-07 18:00:11 -07:00
Arthur Eubanks
1a92538daa [gn build] Manually port 5b158093e 2021-05-07 17:54:32 -07:00
Amara Emerson
9146866d14 [AArch64][GlobalISel] Create a new minimal combiner pass just for -O0.
We never bothered to have a separate set of combines for -O0 in the prelegalizer
before. This results in some minor performance hits for a mode where performance
isn't a concern (although not regressing code size significantly is still preferable).

This also removes the CSE option since we don't need it for -O0.

Through experiments, I've arrived at a set of combines that gets the most code
size improvement at -O0, while reducing the amount of time spent in the combiner
by around 35% give or take.

Differential Revision: https://reviews.llvm.org/D102038
2021-05-07 17:01:27 -07:00
Amara Emerson
818c390c9c [GlobalISel] Don't form zero/sign extending loads for atomics.
For importing patterns, we only support matching G_LOAD, not G_ZEXTLOAD or
G_SEXTLOAD.

Differential Revision: https://reviews.llvm.org/D101932
2021-05-07 16:41:48 -07:00
Arthur Eubanks
86faf963ab [NewPM] Move analysis invalidation/clearing logging to instrumentation
We're trying to move DebugLogging into instrumentation, rather than
being part of PassManagers/AnalysisManagers.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D102093
2021-05-07 15:25:31 -07:00
Jessica Paquette
f2be584bf8 [AArch64][GlobalISel] Legalize narrow type G_CTPOPs
Using `clampScalar` here because we ought to mark s128 as custom eventually.

(Right now, it will just fall back.)

With this legalization, we get the same code as SDAG:
https://godbolt.org/z/TneoPKrKG

Differential Revision: https://reviews.llvm.org/D100908
2021-05-07 14:52:23 -07:00
Adrian Prantl
aade3db1e3 Fix the module-enabled build by removing a redundant type definition. 2021-05-07 14:45:17 -07:00
Florian Hahn
212afd7758 [LV] Remove reference of PHI from comment, they are not recorded (NFC).
The comment incorrectly states that the PHI is recorded. That's not
accurate, only the recipe for the incoming value is recorded.

Suggested post-commit for 4ba8720f8844.
2021-05-07 21:34:23 +01:00
Andrea Di Biagio
ba5ec98548 [MCA][RegisterFile] Fix register class check for move elimination (PR50265)
The register file should always check if the destination register is from a
register class that allows move elimination.

Before this change, the check on the register class was only performed in a few
very specific cases. However, it should have always been performed.
This patch fixes the issue.

Note that none of the upstream scheduling models is currently affected by this
bug, so there is no test for it. The issue was found by Roman while working on
the znver3 model. I was able to reproduce the issue locally by tweaking the
btver2 model. I then verified that this patch fixes the issue.
2021-05-07 21:30:25 +01:00
Florian Hahn
2803fff409 [LV] Assert if trying to sink replicate region into another region (NFC)
Currently sinking a replicate region into another replicate region is
not supported. Add an assert, to make the problem more obvious, should
it occur.

Discussed post-commit for ccebf7a1096a.
2021-05-07 21:25:35 +01:00
Florian Hahn
d1b5132397 [LV] Rename Region to TargetRegion, similar to SinkRegion (NFC).
Adjust the name to make it clearer this is the region containing the
target recipe, similar to SinkRegion below.

Suggested post-commit for ccebf7a1096a.
2021-05-07 21:25:35 +01:00
Arthur Eubanks
4eb0ba33d1 Revert "[DebugInfo] Fix updateDbgUsersToReg to support DBG_VALUE_LIST"
This reverts commit 0791f968fee259e5c34523167bd58179b8b081c2.

Causing crashes: https://crbug.com/1206764
2021-05-07 12:05:16 -07:00
Florian Hahn
df7e45dd98 [SCEV] By more careful when traversing phis in isImpliedViaMerge.
I think currently isImpliedViaMerge can incorrectly return true for phis
in a loop/cycle, if the found condition involves the previous value of

Consider the case in exit_cond_depends_on_inner_loop.

At some point, we call (modulo simplifications)
isImpliedViaMerge(<=, %x.lcssa, -1, %call, -1).

The existing code tries to prove IncV <= -1 for all incoming values
InvV using the found condition (%call <= -1). At the moment this succeeds,
but only because it does not compare the same runtime value. The found
condition checks the value of the last iteration, but the incoming value
is from the *previous* iteration.

Hence we incorrectly determine that the *previous* value was <= -1,
which may not be true.

I think we need to be more careful when looking at the incoming values
here. In particular, we need to rule out that a found condition refers to
any value that may refer to one of the previous iterations. I'm not sure
there's a reliable way to do so (that also works of irreducible control
flow).

So for now this patch adds an additional requirement that the incoming
value must properly dominate the phi block. This should ensure the
values do not change in a cycle. I am not entirely sure if will catch
all cases and I appreciate a through second look in that regard.

Alternatively we could also unconditionally bail out in this case,
instead of checking the incoming values

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D101829
2021-05-07 19:52:29 +01:00
Fangrui Song
e03ea6bfdf [unittest] Fix -Wunused-variable after D94717 2021-05-07 11:42:16 -07:00
Krzysztof Parzyszek
fd58086fcb Allow empty value list in propagateMetadata(Inst, ArrayOf...)
This will allow writing
  propagateMetadata(Inst, collectInterestingValues(...))
without concern about empty lists. In case of an empty list,
Inst is returned without any changes.
2021-05-07 13:20:50 -05:00
Fangrui Song
9289990558 Internalize some cl::opt global variables or move them under namespace llvm 2021-05-07 11:15:43 -07:00
Saleem Abdulrasool
363580b4e0 lit: revert 134b103fc0f3a995d76398bf4b029d72bebe8162
Revert the 32-process cap on Windows.  When testing with Swift, we found
that there was a time reduction for testing with the higher load.  This
should hopefully not matter much in practice.  In the case that the
original problem with python remains with a high subprocess count, we
can easily revert this change.
2021-05-07 10:22:43 -07:00
Roman Lebedev
6449ef04c8 [X86] AMD Zen 3: mark XMM/YMM (but not MMX!) reg moves as eliminatible in RegisterFile 2021-05-07 20:11:21 +03:00
Roman Lebedev
5f3fe26c82 [X86] AMD Zen 3: MOVSX32rr32 is a zero-cycle move
It measures as such, and the reference docs agree.

I can't easily add a MCA test, because there's no mnemonic for it,
it can only be disassembled or created as a MCInst.
2021-05-07 20:11:20 +03:00
Fangrui Song
0180f4a1d1 [AArch64][ELF] Prefer to lower MC_GlobalAddress operands to .Lfoo$local
Similar to X86 D73230 & 46788a21f9152be3950e57dc526454655682bdd4

With this change, we can set dso_local in clang's -fpic -fno-semantic-interposition mode,
for default visibility external linkage non-ifunc-non-COMDAT definitions.

For such dso_local definitions, variable access/taking the address of a
function/calling a function will go through a local alias to avoid GOT/PLT.

Note: the 'S' inline assembly constraint refers to an absolute symbolic address
or a label reference (D46745).

Differential Revision: https://reviews.llvm.org/D101872
2021-05-07 09:44:26 -07:00
Whitney Tsang
e5ca2592d4 [LoopNest] Consider loop nest with inner loop guard using outer loop
induction variable to be perfect

This patch allow more conditional branches to be considered as loop
guard, and so more loop nests can be considered perfect.

Reviewed By: bmahjour, sidbav

Differential Revision: https://reviews.llvm.org/D94717
2021-05-07 16:04:18 +00:00
Simon Pilgrim
3ceb0303a8 [X86] combineXor - limit fold to non-opaque constants (PR50254)
Ensure we don't try to fold when one might be an opaque constant - the constant fold will fail and then the reverse fold will happen in DAGCombine.....
2021-05-07 16:39:24 +01:00
Roman Lebedev
37c9dcd0cf [X86] AMD Zen 3: _REV variants of zero-cycles moves are also zero-cycles (PR50261)
Sometimes disassembler picks _REV variants of instructions
over the plain ones, which in this case exposed an issue
that the _REV variants aren't being modelled as optimizable moves.
2021-05-07 18:27:40 +03:00
Roman Lebedev
2eec1309e5 [NFC][X86][MCA] AMD Zen3: add test for zero-cycle X87 move 2021-05-07 18:27:40 +03:00
Joseph Tremoulet
b501eafd98 BasicAA: Recognize inttoptr as isEscapeSource
Pointers escape when converted to integers, so a pointer produced by
converting an integer to a pointer must not be a local non-escaping
object.

Reviewed By: nikic, nlopes, aqjune

Differential Revision: https://reviews.llvm.org/D101541
2021-05-07 07:48:50 -07:00
Sanjay Patel
6ec9b04dd2 [AArch64] add test for missed vectorization; NFC
This is a reduction of the example in:
https://llvm.org/PR50256
2021-05-07 10:45:11 -04:00
Roman Lebedev
01a7d33cb8 [NFC][X86][MCA] AMD Zen3 Decrease iteration count in reg-move-elimination tests
Drop it just enough so it still produces the right IPC.
2021-05-07 17:06:45 +03:00
Roman Lebedev
05dc778f5c [X86] AMD Zen 3: throughput for renameable XMM/YMM moves is 6
They are resolved at the register rename stage without
using any execution units.
2021-05-07 17:06:45 +03:00
Roman Lebedev
f8e6315b4a [X86] AMD Zen 3: AVX YMM moves are zero-cycle
I've verified this with llvm-exegesis.
This is not limited to zero registers.
2021-05-07 17:06:45 +03:00
Roman Lebedev
f741d942f9 [X86] AMD Zen 3: AVX XMM moves are zero-cycle
I've verified this with llvm-exegesis.
This is not limited to zero registers.
2021-05-07 17:06:44 +03:00
Roman Lebedev
8c8821fc73 [X86] AMD Zen 3: SSE XMM moves are zero-cycle
I've verified this with llvm-exegesis.
This is not limited to zero registers.

Refs:
AMD SOG 19h, 2.9.4 Zero Cycle Move
The processor is able to execute certain register to register
mov operations with zero cycle delay.

Agner,
22.13 Instructions with no latency
Register-to-register move instructions are resolved at
the register rename stage without using any execution units.
These instructions have zero latency. It is possible to do six such
register renamings per clock cycle, and it is even possible to
rename the same register multiple times in one clock cycle.
2021-05-07 17:06:44 +03:00
Roman Lebedev
e720a8cc78 [NFC][X86][MCA] AMD Zen 3: Add tests for renameable AVX YMM moves 2021-05-07 17:06:44 +03:00
Roman Lebedev
08ef520ecb [NFC][X86][MCA] AMD Zen 3: Add tests for renameable AVX XMM moves 2021-05-07 17:06:44 +03:00
Roman Lebedev
944bd39b12 [NFC][X86][MCA] AMD Zen 3: Add tests for renameable SSE XMM moves 2021-05-07 17:06:44 +03:00
Roman Lebedev
adf9a78691 [X86] AMD Zen 3: throughput for renameable GPR moves is 6
They are resolved at the register rename stage without
using any execution units.
2021-05-07 17:06:43 +03:00
Roman Lebedev
87a05050e0 [NFC][X86] AMD Zen 3: move sched classes for renameables moves togeter 2021-05-07 17:06:43 +03:00
Roman Lebedev
28604fa5a6 [NFC][X86][MCA] Increase iteration count in reg move elimination tests
So the IPC actually stabilizes at 6.
2021-05-07 17:06:43 +03:00
Stephen Tozer
77766b7092 Reapply "[DebugInfo] Drop DBG_VALUE_LISTs with an excessive number of debug operands"
Reapply b623df3c, which was reverted while reverting a different patch
with a breaking change. There are no underlying issues with this patch,
so no changes have been made to the original patch.

This reverts commit b11e4c990771541e440861f017afea7b4ba162f4.
2021-05-07 14:55:02 +01:00
Simon Pilgrim
9edfcde2eb [CodeGen] Ensure UserValue::getDebugLoc() and UserLabel::getDebugLoc() consistently return a const reference NFCI.
Avoids a lot of unnecessary tracking increments/decrements of the underlying TrackingMDNodeRef.
2021-05-07 14:48:23 +01:00
Simon Pilgrim
b5e4cc0124 [DAG] Ensure all SD classes consistently return a const reference with getDebugLoc(). NFCI.
Avoids a lot of unnecessary tracking increments/decrements of the underlying TrackingMDNodeRef.
2021-05-07 14:48:23 +01:00
Benjamin Kramer
f2e0efd85f Retire TargetRegisterInfo::getSpillAlignment
getSpillAlign does the same thing.
2021-05-07 15:16:22 +02:00
Sebastian Neubauer
154e1ab9f4 [AMDGPU] Restrict immediate scratch offsets
gfx9 does not work with negative offsets, gfx10 works only with
aligned negative offsets, but not with unaligned negative offsets.

This is slightly more conservative than needed, gfx9 does support
negative offsets when a VGPR address is used and gfx10 supports
negative, unaligned offsets when an SGPR address is used, but we
do not make use of that with this patch.

Differential Revision: https://reviews.llvm.org/D101292
2021-05-07 14:51:32 +02:00