1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-18 18:42:46 +02:00
Commit Graph

210923 Commits

Author SHA1 Message Date
wlei
a12b3252a9 Revert "[CSSPGO][llvm-profgen] Compress recursive cycles in calling context"
This reverts commit 0609f257dc2e2c3e4c7cd30fe2ffd520117e706b.
2021-02-03 22:16:05 -08:00
wlei
924cc19d25 Revert "[CSSPGO][llvm-profgen] Aggregate samples on call frame trie to speed up profile generation"
This reverts commit 1714ad2336293f351b15dd4b518f9e8618ec38f2.
2021-02-03 22:16:05 -08:00
Petr Hosek
b7bd8d62b5 [NFC] Fix the noprofile attribute comment 2021-02-03 21:54:09 -08:00
Chuanqi Xu
9aeeaeef76 [NFC][Coroutine] Remove redundant comment
The functionallity in the TODO was added before:
https://reviews.llvm.org/rGb3a722e66b75328ab5e2eb5c8572022cb083855b
2021-02-04 12:54:30 +08:00
Kazu Hirata
d6e82443c7 [Transforms/IPO] Use range-based for loops (NFC) 2021-02-03 20:41:20 -08:00
Kazu Hirata
7c818b1f11 [TableGen] Use ListSeparator (NFC) 2021-02-03 20:41:18 -08:00
Kazu Hirata
03a18c0465 [Support] Drop unnecessary const from return types (NFC)
Identified with const-return-type.
2021-02-03 20:41:16 -08:00
wlei
99976f0cf2 [CSSPGO][llvm-profgen] Aggregate samples on call frame trie to speed up profile generation
For CS profile generation, the process of call stack unwinding is time-consuming since for each LBR entry we need linear time to generate the context( hash, compression, string concatenation). This change speeds up this by grouping all the call frame within one LBR sample into a trie and aggregating the result(sample counter) on it, deferring the context compression and string generation to the end of unwinding.

Specifically, it uses `StackLeaf` as the top frame on the stack and manipulates(pop or push a trie node) it dynamically during virtual unwinding so that the raw sample can just be recoded on the leaf node, the path(root to leaf) will represent its calling context. In the end, it traverses the trie and generates the context on the fly.

Results:
Our internal branch shows about 5X speed-up on some large workloads in SPEC06 benchmark.

Differential Revision: https://reviews.llvm.org/D94110
2021-02-03 18:50:14 -08:00
wlei
4683e274de [CSSPGO][llvm-profgen] Compress recursive cycles in calling context
This change compresses the context string by removing cycles due to recursive function for CS profile generation. Removing recursion cycles is a way to normalize the calling context which will be better for the sample aggregation and also make the context promoting deterministic.
Specifically for implementation, we recognize adjacent repeated frames as cycles and deduplicated them through multiple round of iteration.
For example:
Considering a input context string stack:
[“a”, “a”, “b”, “c”, “a”, “b”, “c”, “b”, “c”, “d”]
For first iteration,, it removed all adjacent repeated frames of size 1:
[“a”, “b”, “c”, “a”, “b”, “c”, “b”, “c”, “d”]
For second iteration, it removed all adjacent repeated frames of size 2:
[“a”, “b”, “c”, “a”, “b”, “c”, “d”]
So in the end, we get compressed output:
[“a”, “b”, “c”, “d”]

Compression will be called in two place: one for sample's context key right after unwinding, one is for the eventual context string id in the ProfileGenerator.
Added a switch `compress-recursion` to control the size of duplicated frames, default -1 means no size limit.
Added unit tests and regression test for this.

Differential Revision: https://reviews.llvm.org/D93556
2021-02-03 18:50:14 -08:00
Michael Kruse
930857b772 [OpenMPIRBuilder] Implement collapseLoops.
The collapseLoops method implements a transformations facilitating the implementation of the collapse-clause. It takes a list of loops from a loop nest and reduces it to a single loop that can be used by other methods that are implemented on just a single loop, such as createStaticWorkshareLoop.

This patch shares some changes with D92974 (such as adding some getters to CanonicalLoopNest), used by both patches.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D93268
2021-02-03 19:12:02 -06:00
wlei
0adbe76ec7 [CSSPGO][llvm-profgen] Pseudo probe based CS profile generation
This change implements profile generation infra for pseudo probe in llvm-profgen. During virtual unwinding, the raw profile is extracted into range counter and branch counter and aggregated to sample counter map indexed by the call stack context. This change introduces the last step and produces the eventual profile. Specifically, the body of function sample is recorded by going through each probe among the range and callsite target sample is recorded by extracting the callsite probe from branch's source.

Please refer https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s and https://reviews.llvm.org/D89707 for more context about CSSPGO and llvm-profgen.

**Implementation**

- Extended `PseudoProbeProfileGenerator` for pseudo probe based profile generation.
- `populateBodySamplesWithProbes` reading range counter is responsible for recording function body samples and inferring caller's body samples.
- `populateBoundarySamplesWithProbes` reading branch counter is responsible for recording call site target samples.
- Each sample is recorded with its calling context(named `ContextId`). Remind that the probe based context key doesn't include the leaf frame probe info, so the `ContextId` string is created from two part: one from the probe stack strings' concatenation and other one from the leaf frame probe.
- Added regression test

Test Plan:

ninja & ninja check-llvm

Differential Revision: https://reviews.llvm.org/D92998
2021-02-03 16:21:53 -08:00
Jessica Paquette
fb07198b1d [AArch64][GlobalISel] Change store value type from p0 -> s64 to import patterns
Similar to the G_PTR_ADD + G_LOAD twiddling we do in `preISelLower`.

The imported patterns expect scalars only, so they can't handle things like

```
 G_STORE %ptr1, %ptr2
```

To get around this, use s64 instead.

(This probably makes a good portion of the manual selection code for G_STORE
dead.)

This is a 0.2% geomean code size improvement on CTMark at -Os.

(Best is consumer-typeset @ -0.7%)

Differential Revision: https://reviews.llvm.org/D95908
2021-02-03 16:19:16 -08:00
Nico Weber
1c6798fa1f Revert "[InstrProfiling] Use !associated metadata for counters, data and values"
This reverts commit 97ba5cde52664200819446c1a18de28faf2ed1c6.
Still breaks tests: https://reviews.llvm.org/D76802#2540647
2021-02-03 19:14:34 -05:00
Jessica Paquette
19febf3e8e [AArch64][GlobalISel] Emit G_ASSERT_ZEXT in assignValueToAddress for ZExt params
When we have a zeroext parameter coming in on the stack, build

```
%x = G_LOAD ...
%x_assert_zext = G_ASSERT_ZEXT %x, narrow_size
%trunc = G_TRUNC %x_assert_zext
```

Rather than just loading into the truncated type.

This allows us to optimize cases like this: https://godbolt.org/z/vfjhW8

Differential Revision: https://reviews.llvm.org/D95805
2021-02-03 16:06:05 -08:00
Florian Hahn
b7ef828be9 Revert "[LTO] Use lto::backend for code generation."
This reverts commit 6a59f0560648b43324b5aed51b9ef996404a25e0, because
it is causing failures on green dragon.
2021-02-03 22:49:30 +00:00
Florian Hahn
66a5e06679 Revert "[LTO] Add option enable NewPM with LTOCodeGenerator."
This reverts commit 7a6a2cc81aaf064e6f5bc9a9a16973f552d2bdc2 because
it is causing failures on green dragon.
2021-02-03 22:49:20 +00:00
Florian Hahn
b7fb55584a Revert "[LTOCodeGenerator] Use lto::Config for options (NFC)."
This reverts commit 0d487cf87aa1b609b7db061def3e5ad068576ecf because
it is causing failures on green dragon.
2021-02-03 22:48:54 +00:00
Arthur Eubanks
c05f64384e Turn on the new pass manager by default
This turns on the new pass manager by default for the optimization pipeline in
Clang and ThinLTO in various LLD backends. This also makes uses of `opt
-instcombine` use the new pass manager (unless specifically opted out).

This does not affect the backend target-dependent codegen pipeline.

If this causes regressions, you can opt out of the new pass manager
either via the -DENABLE_EXPERIMENTAL_NEW_PASS_MANAGER=OFF CMake flag
while building LLVM, or via various compiler flags, e.g.
-flegacy-pass-manager for Clang or -Wl,--lto-legacy-pass-manager for
ELF LLD. Please file bugs for any regressions.

Major differences:
* The inliner works slightly differently
* -O1 does some amount of inlining
* LCSSA and LoopSimplify are run before all loop passes
* Loop unswitching is implemented slightly differently
* A new SpeculateAroundPHIs pass is added to the pipeline

https://lists.llvm.org/pipermail/llvm-dev/2021-January/148098.html

Reviewed By: asbirlea, ychen, MaskRay, echristo

Differential Revision: https://reviews.llvm.org/D95380
2021-02-03 14:37:46 -08:00
Amara Emerson
cb21fc8971 [GlobalISel] Add sext(constant) -> constant artifact combine.
This is the G_SEXT counterpart to the existing G_ZEXT/G_ANYEXT combines.

Differential Revision: https://reviews.llvm.org/D95729
2021-02-03 14:10:08 -08:00
Arthur Eubanks
213c28c3fc [NewPM][HelloWorld] Move HelloWorld to Utils
To prevent creating a new component, which creates a new library.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D95907
2021-02-03 12:59:40 -08:00
Richard Smith
d38c68eaec Fix overflowing signed left shift, found by ubsan buildbot. 2021-02-03 12:51:39 -08:00
Krzysztof Parzyszek
65e46aa23b [Hexagon] Add LLVM instruction definitions for Hexagon V68 2021-02-03 13:59:34 -06:00
Rong Xu
8e87532070 [SampleFDO][NFC] Detach SampleProfileLoader from SampleCoverageTracker
This patch detaches SampleProfileLoader from class
SampleCoverageTracker. We plan to move SampleProfileLoader
to a template class. This would remain SampleCoverageTracker
as a class.
Also make callsiteIsHot() as a file static function.

Differential Revision: https://reviews.llvm.org/D95823
2021-02-03 11:38:04 -08:00
Justin Bogner
030339eb55 [GlobalISel] Combine narrowScalar of G_ADD and G_SUB. NFC
These two cases have identical implementations other than an
unreachable part of `G_ADD` that checks if the scalar we're narrowing
is a vector. Combining them to avoid unnecessary divergence.
2021-02-03 11:06:04 -08:00
Matt Arsenault
0db1ff1c80 RegisterCoalescer: Fix not setting undef on coalesced subregister uses
This was only adding undef to the use if the copy itself had a
subregister index. It did not consider the subrange liveness if the
use had a subreg index to begin with.
2021-02-03 13:54:43 -05:00
Matt Arsenault
a94851d3fb RegisterCoalescer: Prune undef subranges from copy pairs in loops
If we had a pair of copies inside a loop which introduced new liveness
to a subregister which was undef before the loop, we would have a
dummy phi-only segment remaining across the loop body. Later, this
false segment would confuse RenameIndependentSubregs causing it to
introduce IMPLICIT_DEFs with broken value numbering.

It seems always adding the lanes to ShrinkMask is OK, so any
conditions should be purely a compile time filter.
2021-02-03 13:42:53 -05:00
Matt Arsenault
89df6b0538 Revert "AMDGPU: Don't consider global pressure when bundling soft clauses"
This reverts commit 1e377a273f59375d8e6a424f66f069b3adfa1ca4.

A regression was reported.
2021-02-03 13:25:05 -05:00
Stanislav Mekhanoshin
8839f6fa24 [AMDGPU] Added -mcpu to couple more tests. NFC. 2021-02-03 10:20:18 -08:00
Craig Topper
40e2c0da6c [DAGCombiner] Remove (sra (shl X, C), C) if X has more than C sign bits.
If sext_inreg is supported, we will turn this into sext_inreg. That
will then remove it if there are enough sign bits. But if sext_inreg
isn't supported, we can still remove the shift pair based on sign
bits.

Split from D95890.
2021-02-03 10:18:40 -08:00
Jeremy Morse
c6219f93d5 Revert "[DWARF] Location-less inlined variables should not have DW_TAG_variable"
This reverts commit ddc2f1e3fb4f8f9ae7dd130e40b60cfc775eba24.

A build-bot objected:

  http://lab.llvm.org:8011/#builders/105/builds/5486
2021-02-03 17:54:33 +00:00
Florian Hahn
52a13d4927 [VPlan] Manage induction value creation using VPValues.
This patch updates the induction value creation to use VPValues of
recipes to map the created values. This should bring is one step closer
to being able to optimize induction recipes directly in VPlan.

Currently widenIntOrFpInduction also generates vector values for a cast
of the induction, if it exists. Make this explicit by adding the cast
instruction to the values defined by the recipe.

Reviewed By: gilr

Differential Revision: https://reviews.llvm.org/D92284
2021-02-03 17:45:03 +00:00
Jeremy Morse
e973e6c287 [DWARF] Location-less inlined variables should not have DW_TAG_variable
Discussed in this thread:

  https://lists.llvm.org/pipermail/llvm-dev/2021-January/148139.html

DwarfDebug::collectEntityInfo accidentally distinguishes between variable
locations that never have a location specified, and variable locations that
have an empty location specified. The latter leads to the creation of an
empty variable referring to the abstract origin.

Fix this by seeking a non-empty location before producing a concrete
entity, to guarantee a DW_AT_location will be produced. Other loops in
collectEntityInfo and endFunctionImpl take care of examining the
retainedNodes collection and ensuring optimised-out variables are created.

Differential Revision: https://reviews.llvm.org/D95617
2021-02-03 17:32:31 +00:00
Krzysztof Parzyszek
8b55662616 [Hexagon] Add ELF flags for Hexagon V68 2021-02-03 11:02:59 -06:00
Florian Hahn
371aed9abb [ConstraintElimination] Add some tests with conds in loop header.
This patch adds a set of tests in which we can add the information from
the pre-header to a loop header, but currently do not do so.
2021-02-03 16:40:43 +00:00
Jay Foad
da98ff117e [AMDGPU] Fix multiclass template parameter types. NFC.
This fixes TableGen parser errors that will be reported when D95874 is
applied.

Differential Revision: https://reviews.llvm.org/D95955
2021-02-03 16:21:51 +00:00
Juneyoung Lee
bda396ca51 Revert "[ConstantFold] Fold more operations to poison"
This reverts commit 53040a968dc2ff20931661e55f05da2ef8b964a0 due to its
bad interaction with select i1 -> and/or i1 transformation.

This fixes:
https://bugs.llvm.org/show_bug.cgi?id=49005
https://bugs.llvm.org/show_bug.cgi?id=48435
2021-02-04 00:24:02 +09:00
Abhina Sreeskantharajan
e6f220185d [test] Use host platform specific error message substitution in lit tests - continued
On z/OS, other error messages are not matched correctly in lit tests.

```
EDC5121I Invalid argument.
EDC5111I Permission denied.
```

This patch adds a lit substitution to fix it.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D95808
2021-02-03 09:53:22 -05:00
Matt Arsenault
6eade37b9b AMDGPU: Move handling of allocation of fixed ABI inputs
For the fixed ABI, set this in the initial argument constructor,
rather than relying on the allocation logic to set the values. Also
stop passing them for amdgpu_gfx, since the DAG path seems to skip
these. I'm unclear on what amdgpu_gfx's expectations are.  This will
allow moving the special input registers out of the normal argument
range.
2021-02-03 09:27:59 -05:00
Sanjay Patel
8d90e2b13c [LoopVectorize] add test for fake min/max; NFC
This goes with the dyn_cast fix:
0fa61304d247a61

That was made after noticing that the assert was over-reaching here:
bbed5f2f8a ( D95690 )
2021-02-03 09:24:57 -05:00
Simon Pilgrim
295b5dea73 [X86][SSE] Support variable-index float/double vector insertion on SSE41+ targets (PR47924)
Extends D95779 to permit insertion into float/doubles vectors while avoiding a lot of aliased memory traffic.

The scalar value is already on the simd unit, so we only need to transfer and splat the index value, then perform the select.

SSE4 codegen is a little bulky due to the tied register requirements of (non-VEX) BLENDPS/PD but the extra moves are cheap so shouldn't be an actual problem.

Differential Revision: https://reviews.llvm.org/D95866
2021-02-03 14:14:35 +00:00
Sebastian Neubauer
1895fb4bef Revert "[AMDGPU] Add a new Clamp Pattern to the GlobalISel Path."
This reverts commits 62af0305b7cc..677a3529d3e6 from D93708.
They cause failures in the sanitizer builds because of uninitialized
values.

A fix is in D95878, but it might take some time until this is pushed,
so reverting the changes for now.
2021-02-03 11:03:34 +01:00
Caroline Concatto
ec0cfd98d5 [AArch64][SVE]Add cost model for broadcast shuffle
This patch adds a cost model for  SK_Broadcast in
AArch64TTIImpl::getShuffleCost with scalable vector.
Without this patch, the scalable vector type relies on  BasicTTIImpl cost
implementation and assert.

Differential Revision: https://reviews.llvm.org/D95598
2021-02-03 09:53:22 +00:00
David Sherwood
9ead40bd43 [VPlan][NFC] Introduce constructors for VPIteration
This patch adds constructors to VPIteration as a cleaner way of
initialising the struct and replaces existing constructions of
the form:

  {Part, Lane}

with

  VPIteration(Part, Lane)

I have also added a default constructor, which is used by VPlan.cpp
when deciding whether to replicate a block or not.

This refactoring will be required in a later patch that adds more
members and functions to VPIteration.

Differential Revision: https://reviews.llvm.org/D95676
2021-02-03 08:52:27 +00:00
Wang, Pengfei
a49cc783f7 [X86] Correct types in tablegen multiclasses found by D95874.
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D95926
2021-02-03 16:05:05 +08:00
Petr Hosek
0778936d8e [InstrProfiling] Use !associated metadata for counters, data and values
C identifier name input sections such as __llvm_prf_* are GC roots so
they cannot be discarded. In LLD, the SHF_LINK_ORDER flag overrides the
C identifier name semantics.

The !associated metadata may be attached to a global object declaration
with a single argument that references another global object, and it
gets lowered to SHF_LINK_ORDER flag. When a function symbol is discarded
by the linker, setting up !associated metadata allows linker to discard
counters, data and values associated with that function symbol.

Note that !associated metadata is only supported by ELF, it does not have
any effect on non-ELF targets.

Differential Revision: https://reviews.llvm.org/D76802
2021-02-02 23:19:51 -08:00
Kazu Hirata
01a807c55f [AsmPrinter] Use ListSeparator (NFC) 2021-02-02 22:52:48 -08:00
Kazu Hirata
482abd98f7 [Transforms/Utils] Use range-based for loops (NFC) 2021-02-02 22:52:47 -08:00
Kazu Hirata
90c5ea15f9 [CodeGen] Drop unnecessary const from return types (NFC)
Identified with const-return-type.
2021-02-02 22:52:45 -08:00
LLVM GN Syncbot
6b9c4c8b9d [gn build] Port fcf03e728007 2021-02-03 05:46:53 +00:00
Hsiangkai Wang
511cdce7af [RISCV] Load/store vector mask types.
Use vle1.v/vse1.v to load/store vector mask types.

Differential Revision: https://reviews.llvm.org/D93364
2021-02-03 13:44:15 +08:00