1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-23 11:13:28 +01:00
Commit Graph

197635 Commits

Author SHA1 Message Date
Simon Pilgrim
ec56a4b22c TextAPIWriter.h - reduce MemoryBuffer.h include to forward declarations. NFC. 2020-06-02 11:06:10 +01:00
Florian Hahn
f52bf2135d [LV] Make sure the MaxVF is a power-of-2 by rounding down.
LV currently only supports power of 2 vectorization factors, which has
been made explicit with the assertion added in
840450549c9199150cbdee29acef756c19660ca1.

However, if the widest type is not a power-of-2 the computed MaxVF won't
be a power-of-2 either. This patch updates computeFeasibleMaxVF to
ensure the returned value is a power-of-2 by rounding down to the
nearest power-of-2.

Fixes PR46139.

Reviewers: Ayal, gilr, rengolin

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D80870
2020-06-02 10:40:49 +01:00
Denis Antrushin
bc1dd0f13a [EarlyCSE] Common gc.relocate calls.
gc.relocate intrinsic is special in that its second and third operands
are not real values, but indices into relocate's parent statepoint list
of GC pointers.
To be CSE'd, they need special handling in `isEqual()` and `getHashCode()`.

Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D80445
2020-06-02 12:25:43 +03:00
Simon Pilgrim
37450a7b9e [VectorCombine][X86] Add loaded insert tests from D80885 2020-06-02 10:04:05 +01:00
Simon Atanasyan
55666dcc93 [mips] Support 64-bit relative relocations
MIPS 64-bit ABI does not provide special PC-relative relocation like
R_MIPS_PC32 in 32-bit case. But we can use a "chain of relocation"
defined by N64 ABIs. In that case one relocation record might contain up
to three relocations which applied sequentially. Width of a final relocation
mask applied to the result of relocation depends on the last relocation
in the chain. In case of 64-bit PC-relative relocation we need the following
chain: `R_MIPS_PC32 | R_MIPS_64`. The first relocation calculates an
offset, but does not truncate the result. The second relocation just
apply calculated result as a 64-bit value.

The 64-bit PC-relative relocation might be useful in generation of
`.eh_frame` sections to escape passing `-Wl,-z,notext` flags to linker.

Differential Revision: https://reviews.llvm.org/D80390
2020-06-02 11:44:11 +03:00
Kazushi (Jam) Marukawa
b3fa0c070e [VE] Support I32/F32 registers in assembler parser
Summary:
Support I32/F32 registers in assembler parser and add regression tests of LD/ST
instructions.

Differential Revision: https://reviews.llvm.org/D80777
2020-06-02 10:22:45 +02:00
Clement Courbet
0857d5bd0f [llvm-exegesis] Fix D80610.
Summary:
Using a .data() member on a StringRef was discarding the StringRef
size, breaking llvm-exegesis on machines with counter sums (e.g.
Zen2).

Reviewers: oontvoo

Subscribers: mstojanovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80982
2020-06-02 10:10:01 +02:00
Sam Parker
7788e0be2b [NFC][ARM][AArch64] Test runs
Add code size tests runs for memory ops for both architectures.
2020-06-02 09:05:30 +01:00
Sriraman Tallam
58452e0fcc Options for Basic Block Sections, enabled in D68063 and D73674.
This patch adds clang options:
-fbasic-block-sections={all,<filename>,labels,none} and
-funique-basic-block-section-names.
LLVM Support for basic block sections is already enabled.

+ -fbasic-block-sections={all, <file>, labels, none} : Enables/Disables basic
block sections for all or a subset of basic blocks. "labels" only enables
basic block symbols.
+ -funique-basic-block-section-names: Enables unique section names for
basic block sections, disabled by default.

Differential Revision: https://reviews.llvm.org/D68049
2020-06-02 00:23:32 -07:00
Denis Antrushin
40fd2ba064 [StatepointLowering] Handle UNDEF gc values.
Do not spill UNDEF GC values. Instead, replace corresponding
gc.relocate intrinsic with an (arbitrary, but recognizable) constant.

Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D80714
2020-06-02 10:18:33 +03:00
Dominik Montada
505cc9fd53 [GlobalISel] Combine scalar unmerge(trunc)
Summary:
Combine unmerge(trunc) to enable other merge combines.
Without this combine, the scalar unmerge(trunc(merge))
pattern cannot be combined and easily lead to
hard-to-legalize merge/unmerge artifacts.

Reviewed By: arsenm

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79567
2020-06-02 08:56:18 +02:00
Dominik Montada
93d1e1c124 [NFC] Move vector unmerge(trunc) combine to function
In preparation of D79567, move arsenm's vector unmerge(trunc)
combine to a new function `tryFoldUnmergeCast`
2020-06-02 08:56:17 +02:00
Xing GUO
2955eebf43 [ObjectYAML][DWARF] Let dumpPubSection return DWARFYAML::PubSection.
Summary: This patch addresses comments in [D80722](https://reviews.llvm.org/D80722#inline-742353)

Reviewers: grimar, jhenderson

Reviewed By: grimar, jhenderson

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80861
2020-06-02 14:38:26 +08:00
Yevgeny Rouban
f732af67dd [BrachProbablityInfo] Proportional distribution of reachable probabilities
When fixing probability of unreachable edges in
BranchProbabilityInfo::calcMetadataWeights() proportionally distribute
remainder probability over the reachable edges. The old implementation
distributes the remainder probability evenly.
See examples in the fixed tests.

Reviewers: yamauchi, ebrevnov
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80611
2020-06-02 12:06:52 +07:00
Richard Smith
04069444bf Fix violations of [basic.class.scope]p2.
These cases all follow the same pattern:

struct A {
  friend class X;
  //...
  class X {};
};

But 'friend class X;' injects 'X' into the surrounding namespace scope,
rather than introducing a class member. So the second 'class X {}' is a
completely different type, which changes the meaning of the earlier name
'X' from '::X' to 'A::X'.

Additionally, the friend declaration is pointless -- members of a class
don't need to be befriended to be able to access private members.
2020-06-01 22:03:05 -07:00
Craig Topper
e7fae8455d [X86] Fix a few recursivelyDeleteUnusedNodes calls that were trying to delete nodes before their user was really gone.
We looked through a truncate to get to the load. So we should be
deleting the truncate first.

There is a check that the node is really unused before deleting
so this didn't cause a functional issue.
2020-06-01 21:55:13 -07:00
Yevgeny Rouban
129b7e4405 [BrachProbablityInfo] Rename loop variables. NFC 2020-06-02 10:55:27 +07:00
Vedant Kumar
5c16010e47 [docs] Sketch outline for HowToUpdateDebugInfo.rst
Summary:
Sketch the outline for a new document that explains how to update debug
info in various kinds of code transformations.

Some of the guidelines that belong in HowToUpdateDebugInfo.rst were in
SourceLevelDebugging.rst already under the debugify section. It seems
like the distinction between the two docs ought to be that the former is
more prescriptive, while the latter is more descriptive.

To that end I've consolidated the "how to update debug info" guidelines
which were in SourceLevelDebugging.rst into the new doc, along with the
information about using "debugify" to test transformations. Since we've
added a mir-debugify pass, I've described that as well.

Reviewers: aprantl, jmorse, chrisjackson, dsanders

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80052
2020-06-01 16:45:18 -07:00
Amara Emerson
74c92ba816 [AArch64][GlobalISel] Split G_GLOBAL_VALUE into ADRP + G_ADD_LOW and optimize.
The concept of G_GLOBAL_VALUE is nice and simple, but always using it as the
representation for global var addressing until selection time creates some
problems in optimizing accesses in certain code/relocation models.

The problem comes from trying to optimize adrp -> add -> load/store sequences
in the most common "small" code model. These accesses can be optimized into an
adrp -> load with the add offset being folded into the load's immediate field.
If we try to keep all global var references as a single generic instruction
then by the time we get to the complex operand trying to match these, we end up
generating an adrp at the point of use. The real issue here is that we don't
have any form of CSE during selection, so the code size will bloat from many
redundant adrp's.

This patch custom legalizes small code mode non-GOT G_GLOBALs into target ADRP
and a new "target specific generic opcode" G_ADD_LOW. We also teach the
localizer to localize these instructions via the custom hook that was added
recently. Finally, the complex pattern for indexed loads/stores is extended to
try to fold these G_ADD_LOW instructions into the load immediate.

On -O0 CTMark, we see a 0.8% geomean code size improvement. We should also see
some minor performance improvements too.

Differential Revision: https://reviews.llvm.org/D78465
2020-06-01 16:00:56 -07:00
Amara Emerson
9e0493b789 [AArch64] Fix CollectLOH creating an AdrpAdd LOH when there's a live used reg
between the two instructions.

If there's a pattern like:
$xA = ADRP foo @PAGE
[some killing use of reg Xb]
$Xb = ADDXri $Xa, 0, @PAGEOFF

CollectLOH would create an AdrpAdd LOH that resulted in the linker optimizing
this sequence into:
$xB = ADR foo
[some killing use of reg $Xb]
... and therefore clobbers the live $Xb register that was used by the
instruction in between.

This was discovered by a GlobalISel patch D78465 which broke up global variable
accesses into two pseudos, which in some cases could be moved apart.

Differential Revision: https://reviews.llvm.org/D80834
2020-06-01 16:00:55 -07:00
Vedant Kumar
22f3fd7742 [LiveDebugValues] Remove early-exit when testing regmasks, NFC
In transferRegisterDef, if the instruction has a regmask attached, we'll
check if any currently used register is clobbered by the regmask.

The early exit in this scan isn't necessary, costs a set lookup, and is
almost never taken [1]. Delete it.

[1]
http://lab.llvm.org:8080/coverage/coverage-reports/coverage/Users/buildslave/jenkins/workspace/coverage/llvm-project/llvm/lib/CodeGen/LiveDebugValues.cpp.html#L1136
2020-06-01 15:16:10 -07:00
Matt Arsenault
3cd292c66e AMDGPU: Change internal tracking of wave size
Store the log2 wave size instead of forcing division and log2
operations when querying either.
2020-06-01 17:55:08 -04:00
Joseph Huber
76c68e0c0b [OpenMP] Replace Clang's OpenMP RTL Definitions with OMPKinds.def
Summary: This changes Clang's generation of OpenMP runtime functions to use the types and functions defined in OpenMPKinds and OpenMPConstants. New OpenMP runtime function information should now be added to OMPKinds.def. This patch also changed the definitions of __kmpc_push_num_teams and __kmpc_copyprivate to match those found in the runtime.

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: jfb, AndreyChurbanov, openmp-commits, fghanim, hiraditya, sstefan1, cfe-commits, llvm-commits

Tags: #openmp, #clang, #llvm

Differential Revision: https://reviews.llvm.org/D80222
2020-06-01 16:23:10 -04:00
Sterling Augustine
d40cd3b3ce For --relativenames, ignore directory 0, which is the comp_dir.
Update for upstream comments. Improve test by writing all the debug
info by hand.

Reviewers: dblaikie, jhenderson

Subscribers: hiraditya, MaskRay, rupprecht, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80168
2020-06-01 13:13:37 -07:00
Mircea Trofin
368ac03869 [llvm][NFC] Cache FAM in InlineAdvisor
Summary:
This simplifies the interface by storing the function analysis manager
with the InlineAdvisor, and, thus, not requiring it be passed each time
we inquire for an advice.

Reviewers: davidxl, asbirlea

Subscribers: eraman, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80405
2020-06-01 13:02:34 -07:00
Daniel Grumberg
cc866fecc7 Add DIAError.h to list of headers excluded from the LLVM_DebugInfo_PDB module
Differential Revision: https://reviews.llvm.org/D80808
2020-06-01 21:01:05 +01:00
Florian Hahn
2456f3f2db [Matrix] Implement matrix index expressions ([][]).
This patch implements matrix index expressions
(matrix[RowIdx][ColumnIdx]).

It does so by introducing a new MatrixSubscriptExpr(Base, RowIdx, ColumnIdx).
MatrixSubscriptExprs are built in 2 steps in ActOnMatrixSubscriptExpr. First,
if the base of a subscript is of matrix type, we create a incomplete
MatrixSubscriptExpr(base, idx, nullptr). Second, if the base is an incomplete
MatrixSubscriptExpr, we create a complete
MatrixSubscriptExpr(base->getBase(), base->getRowIdx(), idx)

Similar to vector elements, it is not possible to take the address of
a MatrixSubscriptExpr.
For CodeGen, a new MatrixElt type is added to LValue, which is very
similar to VectorElt. The only difference is that we may need to cast
the type of the base from an array to a vector type when accessing it.

Reviewers: rjmccall, anemet, Bigcheese, rsmith, martong

Reviewed By: rjmccall

Differential Revision: https://reviews.llvm.org/D76791
2020-06-01 20:08:49 +01:00
Sanjay Patel
1a3f9700f3 [InstCombine] fix use of base VectorType; NFC
SimplifyDemandedVectorElts() bails out on ScalableVectorType
anyway, but we can exit faster with the external check.

Move this to a helper function because there are likely other
vector folds that we can try here.
2020-06-01 14:28:31 -04:00
Matt Arsenault
6b892181f5 AMDGPU: Fix not emitting nofpexcept on fdiv expansion
In this awkward case, we have to emit custom pseudo-constrained FP
wrappers. InstrEmitter concludes that since a mayRaiseFPException
instruction had a chain, it can't add nofpexcept.

Test deferred until mayRaiseFPException is really set on everything.
2020-06-01 14:10:26 -04:00
Vedant Kumar
b5e5fd1027 [LiveDebugValues] Add LocIndex::u32_{location,index}_t types for readability, NFC
This is per Adrian's suggestion in https://reviews.llvm.org/D80684.
2020-06-01 11:02:36 -07:00
Vedant Kumar
6f84fd7763 [LiveDebugValues] Speed up removeEntryValue, NFC
Summary:
Instead of iterating over all VarLoc IDs in removeEntryValue(), just
iterate over the interval reserved for entry value VarLocs. This changes
the iteration order, hence the test update -- otherwise this is NFC.

This appears to give an ~8.5x wall time speed-up for LiveDebugValues when
compiling sqlite3.c 3.30.1 with a Release clang (on my machine):

```
          ---User Time---   --System Time--   --User+System--   ---Wall Time--- --- Name ---
  Before: 2.5402 ( 18.8%)   0.0050 (  0.4%)   2.5452 ( 17.3%)   2.5452 ( 17.3%) Live DEBUG_VALUE analysis
   After: 0.2364 (  2.1%)   0.0034 (  0.3%)   0.2399 (  2.0%)   0.2398 (  2.0%) Live DEBUG_VALUE analysis
```

The change in removeEntryValue() is the only one that appears to affect
wall time, but for consistency (and to resolve a pending TODO), I made
the analogous changes for iterating over SpillLocKind VarLocs.

Reviewers: nikic, aprantl, jmorse, djtodoro

Subscribers: hiraditya, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80684
2020-06-01 11:02:36 -07:00
Matt Arsenault
0ba1ec26dd DAG: Fix getNode dropping flags if there's a glue output
The AMDGPU non-strict fdiv lowering needs to introduce an FP mode
switch in some cases, and has custom nodes to provide chain/glue for
the intermediate FP operations. We need to propagate nofpexcept here,
but getNode was dropping the flags.

Adding nofpexcept in the AMDGPU custom lowering is left to a future
patch.

Also fix a second case where flags were dropped, but in this case it
seems it just didn't handle this number of operands.

Test will be included in future AMDGPU patch.
2020-06-01 13:48:02 -04:00
Hiroshi Yamauchi
c65f25e192 [PGO] Improve the working set size heuristics under the partial sample PGO.
Summary:
The working set size heuristics (ProfileSummaryInfo::hasHugeWorkingSetSize)
under the partial sample PGO may not be accurate because the profile is partial
and the number of hot profile counters in the ProfileSummary may not reflect the
actual working set size of the program being compiled.

To improve this, the (approximated) ratio of the the number of profile counters
of the program being compiled to the number of profile counters in the partial
sample profile is computed (which is called the partial profile ratio) and the
working set size of the profile is scaled by this ratio to reflect the working
set size of the program being compiled and used for the working set size
heuristics.

The partial profile ratio is approximated based on the number of the basic
blocks in the program and the NumCounts field in the ProfileSummary and computed
through the thin LTO indexing. This means that there is the limitation that the
scaled working set size is available to the thin LTO post link passes only.

Reviewers: davidxl

Subscribers: mgorny, eraman, hiraditya, steven_wu, dexonsmith, arphaman, dang, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79831
2020-06-01 10:29:23 -07:00
Matt Arsenault
3fc69c930e AMDGPU: Fix test in code directory 2020-06-01 13:26:51 -04:00
Matt Arsenault
bf7398d4cf AMDGPU: Remove dead file 2020-06-01 13:26:51 -04:00
hsmahesha
ed5decd2c8 [AMDGPU/MemOpsCluster] Let mem ops clustering logic also consider number of clustered bytes
Summary:
While clustering mem ops, AMDGPU target needs to consider number of clustered bytes
to decide on max number of mem ops that can be clustered. This patch adds support to pass
number of clustered bytes to target mem ops clustering logic.

Reviewers: foad, rampitec, arsenm, vpykhtin, javedabsar

Reviewed By: foad

Subscribers: MatzeB, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, javed.absar, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80545
2020-06-01 22:52:34 +05:30
Stanislav Mekhanoshin
d47b2c2a4b Temporarily removed unstable test. NFC. 2020-06-01 10:18:54 -07:00
Matt Arsenault
7e6b33626b AMDGPU: Fix alignment for dynamic allocas
The alignment value also needs to be scaled by the wave size.
2020-06-01 13:06:37 -04:00
Stanislav Mekhanoshin
3f75cfd780 Update some names in test. NFC.
There seems to be some instability with IR nameing between
platforms. Attempted to fix it with replacing dot-numbered
names.
2020-06-01 09:11:18 -07:00
Fangrui Song
18064a73bd [Object] Add DF_1_PIE
This flag (and the whole field DT_FLAGS_1) originated from Solaris. I intend to use it in an LLD patch D80872.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D80871
2020-06-01 08:56:02 -07:00
Sanjay Patel
8cc0da7038 [InstCombine] add test for select-of-shuffle; NFC
This is based on an example in D80658
2020-06-01 11:52:07 -04:00
Stanislav Mekhanoshin
17a709861f Process gep (phi ptr1, ptr2) in SROA
Differential Revision: https://reviews.llvm.org/D79218
2020-06-01 08:41:05 -07:00
Sam Clegg
232d118233 [WebAssembly] Update test expectations
simd-2.C now compiles thanks to:
  https://github.com/WebAssembly/wasi-libc/pull/183

Differential Revision: https://reviews.llvm.org/D80930
2020-06-01 08:35:27 -07:00
Sanjay Patel
24f9324c29 [InstNamer] use 'i' for Instructions, not 'tmp'
As discussed in https://bugs.llvm.org/show_bug.cgi?id=45951 and
D80584, the name 'tmp' is almost always a bad choice, but we have
a legacy of regression tests with that name because it was baked
into utils/update_test_checks.py.

This change makes -instnamer more consistent (already using "arg"
and "bb", the common LLVM shorthand). And it avoids the conflict
in telling users of the FileCheck script to run "-instnamer" to
create a better regression test and having that cause a warn/fail
in update_test_checks.py.
2020-06-01 11:11:14 -04:00
Ehud Katz
1b0a29266b [StructurizeCFG] Fix an incorrect comment, NFC. 2020-06-01 17:42:09 +03:00
James Henderson
47c0f73fb2 [Support] Add more context to DataExtractor getLEB128 errors
Reviewed by: clayborg, dblaikie, labath

Differential Revision: https://reviews.llvm.org/D80799
2020-06-01 14:00:01 +01:00
James Henderson
b5cd4f0921 [DebugInfo] Add use of truncating data extractor to debug line parsing
This will ensure that nothing can ever start parsing data from a future
sequence and part-read data will be returned as 0 instead.

Reviewed by: aprantl, labath

Differential Revision: https://reviews.llvm.org/D80796
2020-06-01 12:33:21 +01:00
Sanjay Patel
804d4bd875 [utils] change default nameless value to "TMP"
This is effectively reverting rGbfdc2552664d to avoid test churn
while we figure out a better way forward.

We at least salvage the warning on name conflict from that patch
though.

If we change the default string again, we may want to mass update
tests at the same time. Alternatively, we could live with the poor
naming if we change -instnamer.

This also adds a test to LLVM as suggested in the post-commit
review. There's a clang test that is also affected. That seems
like a layering violation, but I have not looked at fixing that yet.

Differential Revision: https://reviews.llvm.org/D80584
2020-06-01 06:54:45 -04:00
James Henderson
129a1ceed7 [llvm-dwarfdump][test] Use verbose output to check expected opcodes
The debug_line_invalid.test test case was previously using the
interpreted line table dumping to identify which opcodes have been
parsed. This change moves to looking for the expected opcodes
explicitly. This is probably a little clearer and also allows for
testing some cases that wouldn't be easily identifiable from the
interpreted table.

Reviewed by: MaskRay

Differential Revision: https://reviews.llvm.org/D80795
2020-06-01 11:48:02 +01:00
Simon Pilgrim
61aa867bc8 ARMFrameLowering.h - remove unnecessary includes. NFC.
They are implicitly included in TargetFrameLowering.h and only ever used in TargetFrameLowering override methods.
2020-06-01 11:47:13 +01:00