Part of the <=> changes in C++20 make certain patterns of writing equality
operators ambiguous with themselves (sorry!).
This patch goes through and adjusts all the comparison operators such that
they should work in both C++17 and C++20 modes. It also makes two other small
C++20-specific changes (adding a constructor to a type that cases to be an
aggregate, and adding casts from u8 literals which no longer have type
const char*).
There were four categories of errors that this review fixes.
Here are canonical examples of them, ordered from most to least common:
// 1) Missing const
namespace missing_const {
struct A {
#ifndef FIXED
bool operator==(A const&);
#else
bool operator==(A const&) const;
#endif
};
bool a = A{} == A{}; // error
}
// 2) Type mismatch on CRTP
namespace crtp_mismatch {
template <typename Derived>
struct Base {
#ifndef FIXED
bool operator==(Derived const&) const;
#else
// in one case changed to taking Base const&
friend bool operator==(Derived const&, Derived const&);
#endif
};
struct D : Base<D> { };
bool b = D{} == D{}; // error
}
// 3) iterator/const_iterator with only mixed comparison
namespace iter_const_iter {
template <bool Const>
struct iterator {
using const_iterator = iterator<true>;
iterator();
template <bool B, std::enable_if_t<(Const && !B), int> = 0>
iterator(iterator<B> const&);
#ifndef FIXED
bool operator==(const_iterator const&) const;
#else
friend bool operator==(iterator const&, iterator const&);
#endif
};
bool c = iterator<false>{} == iterator<false>{} // error
|| iterator<false>{} == iterator<true>{}
|| iterator<true>{} == iterator<false>{}
|| iterator<true>{} == iterator<true>{};
}
// 4) Same-type comparison but only have mixed-type operator
namespace ambiguous_choice {
enum Color { Red };
struct C {
C();
C(Color);
operator Color() const;
bool operator==(Color) const;
friend bool operator==(C, C);
};
bool c = C{} == C{}; // error
bool d = C{} == Red;
}
Differential revision: https://reviews.llvm.org/D78938
in ExtBinary format.
Currently ExtBinary format doesn't support multiple sections with the same type
in the profile. We add the support in this patch. Previously we use the section
type to identify a section uniquely. Now we introduces a LayoutIndex in the
SecHdrTableEntry and use the LayoutIndex to locate the target section. The
allocations of NameTable and FuncOffsetTable are adjusted accordingly.
Currently it works as a NFC because it won't change anything for current layout.
The test for multiple sections support will be included in another patch where a
new type of profile containing multiple sections with the same type is
introduced.
Differential Revision: https://reviews.llvm.org/D93254
This change enables pseudo-probe-based sample counts to be consumed by the sample profile loader under the regular `-fprofile-sample-use` switch with minimal adjustments to the existing sample file formats. After the counts are imported, a probe helper, aka, a `PseudoProbeManager` object, is automatically launched to verify the CFG checksum of every function in the current compilation against the corresponding checksum from the profile. Mismatched checksums will cause a function profile to be slipped. A `SampleProfileProber` pass is scheduled before any of the `SampleProfileLoader` instances so that the CFG checksums as well as probe mappings are available during the profile loading time. The `PseudoProbeManager` object is set up right after the profile reading is done. In the future a CFG-based fuzzy matching could be done in `PseudoProbeManager`.
Samples will be applied only to pseudo probe instructions as well as probed callsites once the checksum verification goes through. Those instructions are processed in the same way that regular instructions would be processed in the line-number-based scenario. In other words, a function is processed in a regular way as if it was reduced to just containing pseudo probes (block probes and callsites).
**Adjustment to profile format **
A CFG checksum field is being added to the existing AutoFDO profile formats. So far only the text format and the extended binary format are supported. For the text format, a new line like
```
!CFGChecksum: 12345
```
is added to the end of the body sample lines. For the extended binary profile format, we introduce a metadata section to store the checksum map from function names to their CFG checksums.
Differential Revision: https://reviews.llvm.org/D92347
gcov computes the line execution count as the sum of (a) counts from
predecessors on other lines and (b) the sum of loop execution counts of blocks
on the same line (think of loops on one line).
For (b), we use Donald B. Johnson's cycle enumeration algorithm and perform
cycle cancelling for each cycle. This number of candidate cycles were
exponential and D93036 made it polynomial by skipping zero count cycles. The
time complexity is high (O(V*E^2) (it could be O(E^2) but the linear `Blocks`
check made it higher) and the implementation is complex.
We could just identify loops and sum all back edges. However, this requires a
dominator tree construction which is more complex. The time complexity can be
decreased to almost linear, though.
This patch just performs cycle cancelling iteratively. Add two members
`traversable` and `incoming` to GCOVArc. There are 3 states:
* `!traversable`: blocks not on this line or explored blocks
* `traversable && incoming == nullptr`: unexplored blocks
* `traversable && incoming != nullptr`: blocks which are being explored (on the stack)
If an arc points to a block being explored, a cycle has been found.
Let E be the number of arcs. Every time a cycle is found, at least one arc is
saturated (`edgeCount` reduced to 0), so there are at most E cycles. Finding one
cycle takes O(E) time, so the overall time complexity is O(E^2). Note that we
always augment through a back edge and never need to augment its reverse edge so
reverse edges in traditional flow networks are not needed.
Reviewed By: xinhaoyuan
Differential Revision: https://reviews.llvm.org/D93073
MD5 is used.
Currently during sample profile loading, NameTable has to be loaded entirely
up front before any name string is retrieved. That is because NameTable is
stored using ULEB128 encoding and cannot be directly accessed like an array.
However, if MD5 is used to represent name in the NameTable, it has fixed
length. If MD5 names are stored in uint64_t type instead of ULEB128, NameTable
can be accessed like an array then in many cases only part of the NameTable
has to be read. This is helpful for reducing compile time especially when
small source file is compiled. We find that after this change, the elapsed
time to build a large application distributively is reduced by 5% and the
accumulative cpu time used for building is also reduced by 5%. The size of
the profile is slightly reduced with this change by ~0.2%, and that also
indicates encoding MD5 in ULEB128 doesn't save the storage space.
Differential Revision: https://reviews.llvm.org/D92621
Text section prefix is created in CodeGenPrepare, it's file format independent implementation, text section name is written into object file in TargetLoweringObjectFile, it's file format dependent implementation, port code of adding text section prefix to text section name from ELF to COFF.
Different with ELF that use '.' as concatenation character, COFF use '$' as concatenation character. That is, concatenation character is variable, so split concatenation character from text section prefix.
Text section prefix is existing feature of ELF, it can help to reduce icache and itlb misses, it's also make possible aggregate other compilers e.g. v8 created same prefix sections. Furthermore, the recent feature Machine Function Splitter (basic block level text prefix section) is based on text section prefix.
Reviewed By: pengfei, rnk
Differential Revision: https://reviews.llvm.org/D92073
This stack of changes introduces `llvm-profgen` utility which generates a profile data file from given perf script data files for sample-based PGO. It’s part of(not only) the CSSPGO work. Specifically to support context-sensitive with/without pseudo probe profile, it implements a series of functionalities including perf trace parsing, instruction symbolization, LBR stack/call frame stack unwinding, pseudo probe decoding, etc. Also high throughput is achieved by multiple levels of sample aggregation and compatible format with one stop is generated at the end. Please refer to: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s for the CSSPGO RFC.
This change supports context-sensitive profile data generation into llvm-profgen. With simultaneous sampling for LBR and call stack, we can identify leaf of LBR sample with calling context from stack sample . During the process of deriving fall through path from LBR entries, we unwind LBR by replaying all the calls and returns (including implicit calls/returns due to inlining) backwards on top of the sampled call stack. Then the state of call stack as we unwind through LBR always represents the calling context of current fall through path.
we have two types of virtual unwinding 1) LBR unwinding and 2) linear range unwinding.
Specifically, for each LBR entry which can be classified into call, return, regular branch, LBR unwinding will replay the operation by pushing, popping or switching leaf frame towards the call stack and since the initial call stack is most recently sampled, the replay should be in anti-execution order, i.e. for the regular case, pop the call stack when LBR is call, push frame on call stack when LBR is return. After each LBR processed, it also needs to align with the next LBR by going through instructions from previous LBR's target to current LBR's source, which we named linear unwinding. As instruction from linear range can come from different function by inlining, linear unwinding will do the range splitting and record counters through the range with same inline context.
With each fall through path from LBR unwinding, we aggregate each sample into counters by the calling context and eventually generate full context sensitive profile (without relying on inlining) to driver compiler's PGO/FDO.
A breakdown of noteworthy changes:
- Added `HybridSample` class as the abstraction perf sample including LBR stack and call stack
* Extended `PerfReader` to implement auto-detect whether input perf script output contains CS profile, then do the parsing. Multiple `HybridSample` are extracted
* Speed up by aggregating `HybridSample` into `AggregatedSamples`
* Added VirtualUnwinder that consumes aggregated `HybridSample` and implements unwinding of calls, returns, and linear path that contains implicit call/return from inlining. Ranges and branches counters are aggregated by the calling context. Here calling context is string type, each context is a pair of function name and callsite location info, the whole context is like `main:1 @ foo:2 @ bar`.
* Added PorfileGenerater that accumulates counters by ranges unfolding or branch target mapping, then generates context-sensitive function profile including function body, inferring callee's head sample, callsite target samples, eventually records into ProfileMap.
* Leveraged LLVM build-in(`SampleProfWriter`) writer to support different serialization format with no stop
- `getCanonicalFnName` for callee name and name from ELF section
- Added regression test for both unwinding and profile generation
Test Plan:
ninja & ninja check-llvm
Reviewed By: hoy, wenlei, wmi
Differential Revision: https://reviews.llvm.org/D89723
This change adds the context-senstive sample PGO infracture described in CSSPGO RFC (https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s). It introduced an abstraction between input profile and profile loader that queries input profile for functions. Specifically, there's now the notion of base profile and context profile, and they are managed by the new SampleContextTracker for adjusting and merging profiles based on inline decisions. It works with top-down profiled guided inliner in profile loader (https://reviews.llvm.org/D70655) for better inlining with specialization and better post-inline profile fidelity. In the future, we can also expose this infrastructure to CGSCC inliner in order for it to take advantage of context-sensitive profile. This change is the consumption part of context-sensitive profile (The generation part is in this stack: https://reviews.llvm.org/D89707). We've seen good results internally in conjunction with Pseudo-probe (https://reviews.llvm.org/D86193). Pacthes for integration with Pseudo-probe coming up soon.
Currently the new infrastructure kick in when input profile contains the new context-sensitive profile; otherwise it's no-op and does not affect existing AutoFDO.
**Interface**
There're two sets of interfaces for query and tracking respectively exposed from SampleContextTracker. For query, now instead of simply getting a profile from input for a function, we can explicitly query base profile or context profile for given call path of a function. For tracking, there're separate APIs for marking context profile as inlined, or promoting and merging not inlined context profile.
- Query base profile (`getBaseSamplesFor`)
Base profile is the merged synthetic profile for function's CFG profile from any outstanding (not inlined) context. We can query base profile by function.
- Query context profile (`getContextSamplesFor`)
Context profile is a function's CFG profile for a given calling context. We can query context profile by context string.
- Track inlined context profile (`markContextSamplesInlined`)
When a function is inlined for given calling context, we need to mark the context profile for that context as inlined. This is to make sure we don't include inlined context profile when synthesizing base profile for that inlined function.
- Track not-inlined context profile (`promoteMergeContextSamplesTree`)
When a function is not inlined for given calling context, we need to promote the context profile tree so the not inlined context becomes top-level context. This preserve the sub-context under that function so later inline decision for that not inlined function will still have context profile for its call tree. Note that profile will be merged if needed when promoting a context profile tree if any of the node already exists at its promoted destination.
**Implementation**
Implementation-wise, `SampleContext` is created as abstraction for context. Currently it's a string for call path, and we can later optimize it to something more efficient, e.g. context id. Each `SampleContext` also has a `ContextState` indicating whether it's raw context profile from input, whether it's inlined or merged, whether it's synthetic profile created by compiler. Each `FunctionSamples` now has a `SampleContext` that tells whether it's base profile or context profile, and for context profile what is the context and state.
On top of the above context representation, a custom trie tree is implemented to track and manager context profiles. Specifically, `SampleContextTracker` is implemented that encapsulates a trie tree with `ContextTireNode` as node. Each node of the trie tree represents a frame in calling context, thus the path from root to a node represents a valid calling context. We also track `FunctionSamples` for each node, so this trie tree can serve efficient query for context profile. Accordingly, context profile tree promotion now becomes moving a subtree to be under the root of entire tree, and merge nodes for subtree if this move encounters existing nodes.
**Integration**
`SampleContextTracker` is now also integrated with AutoFDO, `SampleProfileReader` and `SampleProfileLoader`. When we detected input profile contains context-sensitive profile, `SampleContextTracker` will be used to track profiles, and all profile query will go to `SampleContextTracker` instead of `SampleProfileReader` automatically. Tracking APIs are called automatically for each inline decision from `SampleProfileLoader`.
Differential Revision: https://reviews.llvm.org/D90125
to their parent classes.
SampleProfileReaderExtBinary/SampleProfileWriterExtBinary specify the typical
section layout currently used by SampleFDO. Currently a lot of section
reader/writer stay in the two classes. However, as we expect to have more
types of SampleFDO profiles, we hope those new types of profiles can share
the common sections while configuring their own sections easily with minimal
change. That is why I move some common stuff from
SampleProfileReaderExtBinary/SampleProfileWriterExtBinary to
SampleProfileReaderExtBinaryBase/SampleProfileWriterExtBinaryBase so new
profiles class inheriting from the base class can reuse them.
Differential Revision: https://reviews.llvm.org/D89524
Following up D81682 and D83903, remove the code for the old value profiling
buckets, which have been replaced with the new, extended buckets and disabled by
default.
Also syncing InstrProfData.inc between compiler-rt and llvm.
Differential Revision: https://reviews.llvm.org/D88838
llvm-cov reports a poor error message when the -arch specifier is
missing or invalid, and a binary has multiple slices. Make the error
message more specific.
(This version of the patch avoids using llvm::none_of -- the way I used
the utility caused compile errors on many bots, possibly because the
wrong overload of `none_of` was selected.)
rdar://40312677
This reverts commit b81d4bfb44c14575130bb06c047728b69c3213aa.
It's causing some bots to fail to build due to: "error: no matching
function for call to ‘__iterator_category".
llvm-cov reports a poor error message when the -arch specifier is
missing or invalid, and a binary has multiple slices. Make the error
message more specific.
rdar://40312677
The current organization of FileInfo and its referenced utility functions of
(GCOVFile, GCOVFunction, GCOVBlock) is messy. Some members of FileInfo are just
copied from GCOVFile. FileInfo::print (.gcov output and --intermediate output)
is interleaved with branch statistics and computation of line execution counts.
--intermediate has to do redundant .gcov output to gather branch statistics.
This patch deletes lots of code and introduces a clearer work flow:
```
fn collectFunction
for each block b
for each line lineNum
let line be LineInfo of the file on lineNum
line.exists = 1
increment function's lines & linesExec if necessary
increment line.count
line.blocks.push_back(&b)
fn collectSourceLine
compute cycle counts
count = incoming_counts + cycle_counts
if line.exists
++summary->lines
if line.count
++summary->linesExec
fn collectSource
for each line
call collectSourceLine
fn main
for each function
call collectFunction
print function summary
for each source file
call collectSource
print file summary
annotate the source file with line execution counts
if -i
print intermediate file
```
The output order of functions and files now follows the original order in
.gcno files.
For a CFG G=(V,E), Knuth describes that by Kirchoff's circuit law, the minimum
number of counters necessary is |E|-(|V|-1). The emitted edges form a spanning
tree. libgcov emitted .gcda files leverages this optimization while clang
--coverage's doesn't.
Propagate counts by Kirchhoff's circuit law so that llvm-cov gcov can
correctly print line counts of gcc --coverage emitted files and enable
the future improvement of clang --coverage.
and indirect call promotion candidate.
Profile remapping is a feature to match a function in the module with its
profile in sample profile if the function name and the name in profile look
different but are equivalent using given remapping rules. This is a useful
feature to keep the performance stable by specifying some remapping rules
when sampleFDO targets are going through some large scale function signature
change.
However, currently profile remapping support is only valid for outline
function profile in SampleFDO. It cannot match a callee with an inline
instance profile if they have different but equivalent names. We found
that without the support for inline instance profile, remapping is less
effective for some large scale change.
To add that support, before any remapping lookup happens, all the names
in the profile will be inserted into remapper and the Key to the name
mapping will be recorded in a map called NameMap in the remapper. During
name lookup, a Key will be returned for the given name and it will be used
to extract an equivalent name in the profile from NameMap. So with the help
of the NameMap, we can translate any given name to an equivalent name in
the profile if it exists. Whenever we try to match a name in the module to
a name in the profile, we will try the match with the original name first,
and if it doesn't match, we will use the equivalent name got from remapper
to try the match for another time. In this way, the patch can enhance the
profile remapping support for searching inline instance and searching
indirect call promotion candidate.
In a planned large scale change of int64 type (long long) to int64_t (long),
we found the performance of a google internal benchmark degraded by 2% if
nothing was done. If existing profile remapping was enabled, the performance
degradation dropped to 1.2%. If the profile remapping with the current patch
was enabled, the performance degradation further dropped to 0.14% (Note the
experiment was done before searching indirect call promotion candidate was
added. We hope with the remapping support of searching indirect call promotion
candidate, the degradation can drop to 0% in the end. It will be evaluated
post commit).
Differential Revision: https://reviews.llvm.org/D86332
Extend the memop value profile buckets to be more flexible (could accommodate a
mix of individual values and ranges) and to cover more value ranges (from 11 to
22 buckets).
Disabled behind a flag (to be enabled separately) and the existing code to be
removed later.
Differential Revision: https://reviews.llvm.org/D81682
is enabled.
When -sample-profile-merge-inlinee is enabled, new FunctionSamples may be
created during profile merge without GUIDToFuncNameMap being initialized.
That will occasionally cause compiler crash. The patch fixes it.
Differential Revision: https://reviews.llvm.org/D84994
PGO profile is usually more precise than sample profile. However, PGO profile
needs to be collected from loadtest and loadtest may not be representative
enough to the production workload. Sample profile collected from production
can be used as a supplement -- for functions cold in loadtest but warm/hot
in production, we can scale up the related function in PGO profile if the
function is warm or hot in sample profile.
The implementation contains changes in compiler side and llvm-profdata side.
Given an instr profile and a sample profile, for a function cold in PGO
profile but warm/hot in sample profile, llvm-profdata will either mark
all the counters in the profile to be -1 or scale up the max count in the
function to be above hot threshold, depending on the zero counter ratio in
the profile. The assumption is if there are too many counters being zero
in the function profile, the profile is more likely to cause harm than good,
then llvm-profdata will mark all the counters to be -1 indicating the
function is hot but the profile is unaccountable. In compiler side, if a
function profile with all -1 counters is seen, the function entry count will
be set to be above hot threshold but its internal profile will be dropped.
In the long run, it may be useful to let compiler support using PGO profile
and sample profile at the same time, but that requires more careful design
and more substantial changes to make two profiles work seamlessly. The patch
here serves as a simple intermediate solution.
Differential Revision: https://reviews.llvm.org/D81981
This reverts commit 4a539faf74b9b4c25ee3b880e4007564bd5139b0.
There is a __llvm_profile_instrument_range related crash in PGO-instrumented clang:
```
(gdb) bt
llvm::ConstantRange const&, llvm::APInt const&, unsigned int, bool) ()
llvm::ScalarEvolution::getRangeForAffineAR(llvm::SCEV const*, llvm::SCEV
const*, llvm::SCEV const*, unsigned int) ()
```
(The body of __llvm_profile_instrument_range is inlined, so we can only find__llvm_profile_instrument_target in the trace)
```
23│ 0x000055555dba0961 <+65>: nopw %cs:0x0(%rax,%rax,1)
24│ 0x000055555dba096b <+75>: nopl 0x0(%rax,%rax,1)
25│ 0x000055555dba0970 <+80>: mov %rsi,%rbx
26│ 0x000055555dba0973 <+83>: mov 0x8(%rsi),%rsi # %rsi=-1 -> SIGSEGV
27│ 0x000055555dba0977 <+87>: cmp %r15,(%rbx)
28│ 0x000055555dba097a <+90>: je 0x55555dba0a76 <__llvm_profile_instrument_target+342>
```
This patch includes the supporting code that enables always
instrumenting the function entry block by default.
This patch will NOT the default behavior.
It adds a variant bit in the profile version, adds new directives in
text profile format, and changes llvm-profdata tool accordingly.
This patch is a split of D83024 (https://reviews.llvm.org/D83024)
Many test changes from D83024 are also included.
Differential Revision: https://reviews.llvm.org/D84261
Extend the memop value profile buckets to be more flexible (could accommodate a
mix of individual values and ranges) and to cover more value ranges (from 11 to
22 buckets).
Disabled behind a flag (to be enabled separately) and the existing code to be
removed later.
Change file static function getEntryForPercentile to be a static member function
in ProfileSummaryBuilder so it can be used by other files.
Differential Revision: https://reviews.llvm.org/D83439
Extend the memop value profile buckets to be more flexible (could accommodate a
mix of individual values and ranges) and to cover more value ranges (from 11 to
22 buckets).
Disabled behind a flag (to be enabled separately) and the existing code to be
removed later.
Differential Revision: https://reviews.llvm.org/D81682
Summary:
Add the --hot-func-list feature to llvm-profdata show for sample profiles. This feature prints a list of hot functions whose max sample count are above the 99% threshold, with their numbers of total samples, total samples percentage, max samples, entry samples, and their function names.
Test Plan:
Reviewers: wenlei, hoyFB
Reviewed By: wenlei, hoyFB
Subscribers: hoyFB, wenlei, weihe, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82355
Summary: Add the --hot-func-list feature to llvm-profdata show for sample profiles. This feature prints a list of hot functions whose max sample count are above the 99% threshold, with their numbers of total samples, total samples percentage, max samples, entry samples, and their function names.
Reviewers: wmi, hoyFB, wenlei
Reviewed By: wmi
Subscribers: hoyFB, wenlei, llvm-commits, weihe
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81800
If .gcda is corrupted, gcov continues to produce a .gcov and just
assumes execution counts are zeros. This is reasonable, because the
program can corrupt its .gcda output. The code path should be similar to
the code path without .gcda.
Between gcov 4.9~8, `gcov -i $file` prints coverage information to
$file.gcov in an intermediate text format (single file, instead of
$source.gcov for each source file).
lcov newer than 2019-05-24 detects -i support and uses it to increase
processing speed. gcov 9 (GCC r265587) removed --intermediate-format
and -i was changed to mean --json-format. However, we consider this
format still useful and support it. geninfo (part of lcov) supports this
format even if we announce that we are compatible with gcov 9.0.0
global-ctor.ll no longer checks what it intended to check
(@_GLOBAL__sub_I_global-ctor.ll needs a !dbg to work).
Rewrite it.
gcov 3.4 and gcov 4.2 use the same format, thus we can lower the version
requirement to 3.4
llvm-cov.test and many Inputs/test* files contain wrong tests.
This patch rewrites a large portion of these files.
The pre-canned .gcno & .gcda are replaced by binaries produced by
clang --coverage (compatible with gcov 4.8~7)
(after some GCDAProfiling.c bugs were fixed by my previous commits).
Also make llvm-cov gcov on a little-endian host capable to parse big-endian .gcno and .gcda,
and make llvm-cov gcov on big-endian host capable to parse little-endian .gcno and .gcda
And bump its version number accordingly.
This is a patched recommit of 7c298c104bfe725d4315926a656263e8a5ac3054
Previous hash implementation was incorrectly passing an uint64_t, that got converted
to an uint8_t, to finalize the hash computation. This led to different functions
having the same hash if they only differ by the remaining statements, which is
incorrect.
Added a new test case that trivially tests that a small function change is
reflected in the hash value.
Not that as this patch fixes the hash computation, it would invalidate all hashes
computed before that patch applies, this is why we bumped the version number.
Update profile data hash entries due to hash function update, except for binary
version, in which case we keep the buggy behavior for backward compatibility.
Differential Revision: https://reviews.llvm.org/D79961
gcov 4.8 (r189778) moved the exit block from the last to the second.
The .gcda format is compatible with 4.7 but
* decoding libgcov 4.7 produced .gcda with gcov [4.7,8) can mistake the
exit block, emit bogus `%s:'%s' has arcs from exit block\n` warnings,
and print wrong `" returned %s` for branch statistics (-b).
* decoding libgcov 4.8 produced .gcda with gcov 4.7 has similar issues.
Also, rename "return block" to "exit block" because the latter is the
appropriate term.
This patch addresses two issues related to adding inline functions to the import list while recursively going through the profiling data.
1. For callsite samples, only add an inlined function to the import list if it's from outside of the module (i.e. only has a declaration inside the module).
2. For body samples, add each target function to the import list if it's from outside of the module (i.e. only has a declaration inside the module). Previously we were using getSubProgram() to check whether it has dbg info, which is inaccurate. This fix properly add imports and could improve the quality of the pass.
Added a few changes to the test to catch these cases.
Differential Revision: https://reviews.llvm.org/D79379
Summary:
In D65848 the function getFuncNameInModule was refactored to no longer use module.
This diff removes the parameter and rename the function name to avoid confusion.
Reviewers: wenlei, wmi, davidxl
Reviewed By: wenlei
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79310
gcov by default prints to a .gcov file. With --stdout, stdout is used.
Some summary information is omitted. There is no separator for multiple
source files.
Even a 2003 version of gcov_read_string does not have the behavior
described by rL204881. The wrong impression might come from
libclang_rt.profile (GCDAProfiling.c)'s corrupted output (since the
initial import rL144865). Note, the corrupted output crashes gcov.
GCDAProfiling.c unnecessarily writes function names to .gcda files.
GCC 4.2 gcc/libgcov.c (now renamed to libgcc/libgcov*) did not write function
names. gcov-7 (compatible) crashes on .gcda produced by libclang_rt.profile
rL176173 realized the problem and introduced a mode to remove function
names.
llvm-cov code apparently takes GCDAProfiling.c output format as truth
and tries to decode function names. Additionally, llvm-cov tries to
decode tags in certain order which does not match libgcov emitted .gcda
files.
This patch fixes the .gcda decoder and makes it work with GCC 8 and 9
(10 is compatible with 9). Note, line statistics are broken and not
fixed by this patch.
Add test/tools/llvm-cov/gcov-{4.7,8,9}.c to test compatibility.
Fix the error of show-prof-info.test on some platforms without zlib.
The common profile usage is to collect profile from a target and then use the profile to guide the optimized build for the same target. There are some cases that no profile can be collected for a target. In those cases, although no full profile is available, it is possible to have some partial profile collected from other targets to optimize common libraries and utilities. A flag is needed to tell the partial profile from the full profile apart, so compiler can use different strategy for them.
Differential Revision: https://reviews.llvm.org/D77426
The common profile usage is to collect profile from a target and then use the profile to guide the optimized build for the same target. There are some cases that no profile can be collected for a target. In those cases, although no full profile is available, it is possible to have some partial profile collected from other targets to optimize common libraries and utilities. A flag is needed to tell the partial profile from the full profile apart, so compiler can use different strategy for them.
Differential Revision: https://reviews.llvm.org/D77426
Compbinary format uses MD5 to represent strings in name table. That gives smaller profile without the need of compression/decompression when writing/reading the profile. The patch adds the support in extbinary format. It is off by default but user can choose to enable it.
Note the feature of using MD5 in name table can bring very small chance of name conflict leading to profile mismatch. Besides, profile using the feature won't have the profile remapping support.
Differential Revision: https://reviews.llvm.org/D76255
After the format change from D69471, there can be more than one section
in an object that contains coverage function records. Look up each of
these sections and concatenate all the records together.
This re-enables the instrprof-merging.cpp test, which previously was
failing on OSes which use comdats.
Thanks to Jeremy Morse, who very kindly provided object files from the
bot I broke to help me debug.
http://lab.llvm.org:8011/builders/clang-armv7-linux-build-cache/builds/27075
error: non-constant-expression cannot be narrowed from type 'uint64_t'
(aka 'unsigned long long') to 'size_t' (aka 'unsigned int') in
initializer list [-Wc++11-narrowing]
return {MappingBuf, getDataSize<FuncRecordTy, Endian>(Record)};
Try again with an up-to-date version of D69471 (99317124 was a stale
revision).
---
Revise the coverage mapping format to reduce binary size by:
1. Naming function records and marking them `linkonce_odr`, and
2. Compressing filenames.
This shrinks the size of llc's coverage segment by 82% (334MB -> 62MB)
and speeds up end-to-end single-threaded report generation by 10%. For
reference the compressed name data in llc is 81MB (__llvm_prf_names).
Rationale for changes to the format:
- With the current format, most coverage function records are discarded.
E.g., more than 97% of the records in llc are *duplicate* placeholders
for functions visible-but-not-used in TUs. Placeholders *are* used to
show under-covered functions, but duplicate placeholders waste space.
- We reached general consensus about giving (1) a try at the 2017 code
coverage BoF [1]. The thinking was that using `linkonce_odr` to merge
duplicates is simpler than alternatives like teaching build systems
about a coverage-aware database/module/etc on the side.
- Revising the format is expensive due to the backwards compatibility
requirement, so we might as well compress filenames while we're at it.
This shrinks the encoded filenames in llc by 86% (12MB -> 1.6MB).
See CoverageMappingFormat.rst for the details on what exactly has
changed.
Fixes PR34533 [2], hopefully.
[1] http://lists.llvm.org/pipermail/llvm-dev/2017-October/118428.html
[2] https://bugs.llvm.org/show_bug.cgi?id=34533
Differential Revision: https://reviews.llvm.org/D69471
Revise the coverage mapping format to reduce binary size by:
1. Naming function records and marking them `linkonce_odr`, and
2. Compressing filenames.
This shrinks the size of llc's coverage segment by 82% (334MB -> 62MB)
and speeds up end-to-end single-threaded report generation by 10%. For
reference the compressed name data in llc is 81MB (__llvm_prf_names).
Rationale for changes to the format:
- With the current format, most coverage function records are discarded.
E.g., more than 97% of the records in llc are *duplicate* placeholders
for functions visible-but-not-used in TUs. Placeholders *are* used to
show under-covered functions, but duplicate placeholders waste space.
- We reached general consensus about giving (1) a try at the 2017 code
coverage BoF [1]. The thinking was that using `linkonce_odr` to merge
duplicates is simpler than alternatives like teaching build systems
about a coverage-aware database/module/etc on the side.
- Revising the format is expensive due to the backwards compatibility
requirement, so we might as well compress filenames while we're at it.
This shrinks the encoded filenames in llc by 86% (12MB -> 1.6MB).
See CoverageMappingFormat.rst for the details on what exactly has
changed.
Fixes PR34533 [2], hopefully.
[1] http://lists.llvm.org/pipermail/llvm-dev/2017-October/118428.html
[2] https://bugs.llvm.org/show_bug.cgi?id=34533
Differential Revision: https://reviews.llvm.org/D69471
This is how it should've been and brings it more in line with
std::string_view. There should be no functional change here.
This is mostly mechanical from a custom clang-tidy check, with a lot of
manual fixups. It uncovers a lot of minor inefficiencies.
This doesn't actually modify StringRef yet, I'll do that in a follow-up.
StringMap.h is very popular (4K uses), and it doesn't need to see
BumpPtrAllocator, which is relatively expensive according to
ClangBuildAnalyzer. StringMap only needs MallocAllocator, so split that
into AllocatorBase.h and use it instead.
Here is the change in header uses:
$ diff -u thedeps-before.txt thedeps-after.txt | \
grep '^[-+] ' | sort | uniq -c | sort -nr
3993 + ../llvm/include/llvm/Support/AllocatorBase.h
758 - ../llvm/include/llvm/Support/Allocator.h
270 - ../llvm/include/llvm/Support/Alignment.h
13 - ../llvm/include/llvm/Support/Host.h
6 - ../llvm/include/llvm/ADT/StringMap.h
4 - ../llvm/include/llvm/Support/SwapByteOrder.h
4 - ../llvm/include/llvm/Support/MathExtras.h
4 - ../llvm/include/llvm/Support/AlignOf.h
4 - ../llvm/include/llvm/ADT/SmallVector.h
1 - ../llvm/include/llvm/Support/PointerLikeTypeTraits.h
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D73392
This is an alternative to the continous mode that was implemented in
D68351. This mode relies on padding and the ability to mmap a file over
the existing mapping which is generally only available on POSIX systems
and isn't suitable for other platforms.
This change instead introduces the ability to relocate counters at
runtime using a level of indirection. On every counter access, we add a
bias to the counter address. This bias is stored in a symbol that's
provided by the profile runtime and is initially set to zero, meaning no
relocation. The runtime can mmap the profile into memory at abitrary
location, and set bias to the offset between the original and the new
counter location, at which point every subsequent counter access will be
to the new location, which allows updating profile directly akin to the
continous mode.
The advantage of this implementation is that doesn't require any special
OS support. The disadvantage is the extra overhead due to additional
instructions required for each counter access (overhead both in terms of
binary size and performance) plus duplication of counters (i.e. one copy
in the binary itself and another copy that's mmapped).
Differential Revision: https://reviews.llvm.org/D69740
Suppose an inline instance has hot total sample count but 0 entry count, and
it is an indirect call target. If the indirect call has no other call target
and inline instance associated with it and it is promoted, currently the
conditional branch generated by indirect call promotion will have invalid
branch profile which is !{!"branch_weights", i32 0, i32 0} -- because the
entry count of the promoted target is 0 and the total entry count of all
targets is also 0. This caused a SEGV in Control Height Reduction and may
cause problem in other passes.
Function entry count of an inline instance is computed by a heuristic --
using either the sample of the starting line or starting inner inline
instance. The patch changes the heuristic a little bit so that when total
sample count is larger than 0, the computed entry count will be at least 1.
Then the new branch profile will be !{!"branch_weights", i32 1, i32 0}.
Differential Revision: https://reviews.llvm.org/D72790
Summary:
When sample profile loader decides not to inline a previously inlined call-site, we adjust the profile of outlined function simply by scaling up its profile counts by call-site count. This means the context-sensitive profile of that inlined instance will be thrown away. This commit try to keep context-sensitive profile for such cases:
- Instead of scaling outlined function's profile, we now properly merge the FunctionSamples of inlined instance into outlined function, including all recursively inlined profile.
- Instead of adjusting the profile for negative inline decision at the end of the sample profile loader pass, we do the profile merge right after processing each function. This change paired with top-down ordering of annotation/inline-replay (a separate diff) will make sure we recursively merge profile back before the profile is used for annotation and inline replay.
A new switch -sample-profile-merge-inlinee is added to enable the new profile merge for tuning. It should be the default behavior eventually.
Reviewers: wmi, davidxl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70653
Revise the coverage mapping format to reduce binary size by:
1. Naming function records and marking them `linkonce_odr`, and
2. Compressing filenames.
This shrinks the size of llc's coverage segment by 82% (334MB -> 62MB)
and speeds up end-to-end single-threaded report generation by 10%. For
reference the compressed name data in llc is 81MB (__llvm_prf_names).
Rationale for changes to the format:
- With the current format, most coverage function records are discarded.
E.g., more than 97% of the records in llc are *duplicate* placeholders
for functions visible-but-not-used in TUs. Placeholders *are* used to
show under-covered functions, but duplicate placeholders waste space.
- We reached general consensus about giving (1) a try at the 2017 code
coverage BoF [1]. The thinking was that using `linkonce_odr` to merge
duplicates is simpler than alternatives like teaching build systems
about a coverage-aware database/module/etc on the side.
- Revising the format is expensive due to the backwards compatibility
requirement, so we might as well compress filenames while we're at it.
This shrinks the encoded filenames in llc by 86% (12MB -> 1.6MB).
See CoverageMappingFormat.rst for the details on what exactly has
changed.
Fixes PR34533 [2], hopefully.
[1] http://lists.llvm.org/pipermail/llvm-dev/2017-October/118428.html
[2] https://bugs.llvm.org/show_bug.cgi?id=34533
Differential Revision: https://reviews.llvm.org/D69471
Add support for continuously syncing profile counter updates to a file.
The motivation for this is that programs do not always exit cleanly. On
iOS, for example, programs are usually killed via a signal from the OS.
Running atexit() handlers after catching a signal is unreliable, so some
method for progressively writing out profile data is necessary.
The approach taken here is to mmap() the `__llvm_prf_cnts` section onto
a raw profile. To do this, the linker must page-align the counter and
data sections, and the runtime must ensure that counters are mapped to a
page-aligned offset within a raw profile.
Continuous mode is (for the moment) incompatible with the online merging
mode. This limitation is lifted in https://reviews.llvm.org/D69586.
Continuous mode is also (for the moment) incompatible with value
profiling, as I'm not sure whether there is interest in this and the
implementation may be tricky.
As I have not been able to test extensively on non-Darwin platforms,
only Darwin support is included for the moment. However, continuous mode
may "just work" without modification on Linux and some UNIX-likes. AIUI
the default value for the GNU linker's `--section-alignment` flag is set
to the page size on many systems. This appears to be true for LLD as
well, as its `no_nmagic` option is on by default. Continuous mode will
not "just work" on Fuchsia or Windows, as it's not possible to mmap() a
section on these platforms. There is a proposal to add a layer of
indirection to the profile instrumentation to support these platforms.
rdar://54210980
Differential Revision: https://reviews.llvm.org/D68351
by ExtBinary format profile
Profile on-demand loading was added for ExtBinary format profile in rL374233,
but currently profile on-demand loading doesn't work well with profile
remapping. The patch adds the support.
Suppose a function in the current module has outline instance in the profile.
The function name in the module is different from the name of the outline
instance, but remapper knows the two names are equal. When loading profile
on-demand, the outline instance has to be loaded with remapper's help.
At the same time SampleProfileReaderItaniumRemapper is changed from a proxy
of SampleProfileReader to a helper member in SampleProfileReader.
Differential Revision: https://reviews.llvm.org/D68901
llvm-svn: 375295
in ExtBinary format
Currently for Text, Binary and ExtBinary format profiles, when we compile a
module with samplefdo, even if there is no function showing up in the profile,
we have to load all the function profiles from the profile input. That is a
waste of compile time.
CompactBinary format profile has already had the support of loading function
profiles on demand. In this patch, we add the support to load profile on
demand for ExtBinary format. It will work no matter the sections in ExtBinary
format profile are compressed or not. Experiment shows it reduces the time to
compile a server benchmark by 30%.
When profile remapping and loading function profiles on demand are both used,
extra work needs to be done so that the loading on demand process will take
the name remapping into consideration. It will be addressed in a follow-up
patch.
Differential Revision: https://reviews.llvm.org/D68601
llvm-svn: 374233
Previously ExtBinary profile format only supports compression using zlib for
profile symbol list. In this patch, we extend the compression support to any
section. User can select some or all of the sections to compress. In an
experiment, for a 45M profile in ExtBinary format, compressing name table
reduced its size to 24M, and compressing all the sections reduced its size
to 11M.
Differential Revision: https://reviews.llvm.org/D68253
llvm-svn: 373914
With this patch, compiler generated profile variables will have its own COMDAT
name for ELF format, which syncs the behavior with COFF. Tested with clang
PGO bootstrap. This shows a modest reduction in object sizes in ELF format.
Differential Revision: https://reviews.llvm.org/D68041
llvm-svn: 373241
or the size of the profile for profile in ExtBinary format.
Fix a test failure on Mac.
[SampleFDO] Expose an interface to return the size of a section or the
size of the profile for profile in ExtBinary format.
Sometimes we want to limit the size of the profile by stripping some functions
with low sample count or by stripping some function names with small text size
from profile symbol list. That requires the profile reader to have the
interfaces returning the size of a section or the size of total profile. The
patch add those interfaces.
At the same time, add some dump facility to show the size of each section.
Differential revision: https://reviews.llvm.org/D67726
llvm-svn: 372478
of the profile for profile in ExtBinary format.
Sometimes we want to limit the size of the profile by stripping some functions
with low sample count or by stripping some function names with small text size
from profile symbol list. That requires the profile reader to have the
interfaces returning the size of a section or the size of total profile. The
patch add those interfaces.
At the same time, add some dump facility to show the size of each section.
llvm-svn: 372439
is enabled.
We can save memory and reduce binary size significantly by enabling
ProfileSampleAccurate. However when ProfileSampleAccurate is true,
function without sample will be regarded as cold and this could
potentially cause performance regression.
To minimize the potential negative performance impact, we want to be
a little conservative here saying if a function shows up in the profile,
no matter as outline instance, inline instance or call targets, treat
the function as not being cold. This will handle the cases such as most
callsites of a function are inlined in sampled binary (thus outline copy
don't get any sample) but not inlined in current build (because of source
code drift, imprecise debug information, or the callsites are all cold
individually but not cold accumulatively...), so that the outline function
showing up as cold in sampled binary will actually not be cold after current
build. After the change, such function will be treated as not cold even
profile-sample-accurate is enabled.
At the same time we lower the hot criteria of callsiteIsHot check when
profile-sample-accurate is enabled. callsiteIsHot is used to determined
whether a callsite is hot and qualified for early inlining. When
profile-sample-accurate is enabled, functions without profile will be
regarded as cold and much less inlining will happen in CGSCC inlining pass,
so we can worry less about size increase and be aggressive to allow more
early inlining to happen for warm callsites and it is helpful for performance
overall.
Differential Revision: https://reviews.llvm.org/D67561
llvm-svn: 372232
Speed up queries for coverage info in a file by reducing the amount of
time spent determining whether a function record corresponds to a file.
This gives a 36% speedup when generating a coverage report for `llc`.
The reduction is entirely in user time.
rdar://54758110
Differential Revision: https://reviews.llvm.org/D67575
llvm-svn: 372025
The check needs to validate a counter offset before performing pointer
arithmetic with the (potentially corrupt) offset.
Found by UBSan's pointer overflow check.
rdar://54843625
Differential Revision: https://reviews.llvm.org/D66979
llvm-svn: 370826
1. zlib::compress accept &size_t but the param is an uint64_t.
2. Some systems don't have zlib installed. Don't use compression by default.
llvm-svn: 370564
cold versus function being newly added.
This is the second half of https://reviews.llvm.org/D66374.
Profile symbol list is the collection of function symbols showing up in
the binary which generates the current profile. It is used to discriminate
function being cold versus function being newly added. Profile symbol list
is only added for profile with ExtBinary format.
During profile use compilation, when profile-sample-accurate is enabled,
a function without profile will be regarded as cold only when it is
contained in that list.
Differential Revision: https://reviews.llvm.org/D66766
llvm-svn: 370563
Before this change, if multiple binary files were presented, all of them must have been instrumented or the load would fail with coverage_map_error::no_data_found.
Patch by Dean Sturtevant.
Differential Revision: https://reviews.llvm.org/D66763
llvm-svn: 370257
This is a followup of https://reviews.llvm.org/D66513. The code calling each
section reader should be put into a separate function (readOneSection), so
SampleProfileExtBinaryReader can override it. Otherwise, the base class
SampleProfileExtBinaryBaseReader will need to be aware of all different kinds
of section readers. That is not right.
Differential Revision: https://reviews.llvm.org/D66693
llvm-svn: 369919
This is a patch split from https://reviews.llvm.org/D66374. It tries to add
a new format of profile called ExtBinary. The format adds a section header
table to the profile and organize the profile in sections, so the future
extension like adding a new section or extending an existing section will be
easier while keeping backward compatiblity feasible.
Differential Revision: https://reviews.llvm.org/D66513
llvm-svn: 369798
We currently get this warning when compiling with libc++:
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/set:454:26: warning: the specified comparator type does not provide a const call operator [-Wuser-defined-warnings]
static_assert(sizeof(__diagnose_non_const_comparator<_Key, _Compare>()), "");
^
llvm-project/llvm/include/llvm/ProfileData/SampleProf.h:193:29: note: in instantiation of template class 'std::__1::set<std::__1::pair<llvm::StringRef, unsigned long long>, llvm::sampleprof::SampleRecord::CallTargetComparator, std::__1::allocator<std::__1::pair<llvm::StringRef, unsigned long long> > >' requested here
const SortedCallTargetSet getSortedCallTargets() const {
^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/__tree:967:5: note: from 'diagnose_if' attribute on '__diagnose_non_const_comparator<std::__1::pair<llvm::StringRef, unsigned long long>, llvm::sampleprof::SampleRecord::CallTargetComparator>':
_LIBCPP_DIAGNOSE_WARNING(!std::__invokable<_Compare const&, _Tp const&, _Tp const&>::value,
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/__config:1320:21: note: expanded from macro '_LIBCPP_DIAGNOSE_WARNING'
__attribute__((diagnose_if(__VA_ARGS__, "warning")))
^ ~~~~~~~~~~~
1 warning generated.
llvm-svn: 369500
Summary:
StringMap is used for storing call target to frequency map for AutoFDO. However the iterating order of StringMap is non-deterministic, which leads to non-determinism in AutoFDO profile output. Now new API getSortedCallTargets and SortCallTargets are added for deterministic ordering and output.
Roundtrip test for text profile and binary profile is added.
Reviewers: wmi, davidxl, danielcdh
Subscribers: hiraditya, mgrang, llvm-commits, twoh
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66191
llvm-svn: 369440
Now that we've moved to C++14, we no longer need the llvm::make_unique
implementation from STLExtras.h. This patch is a mechanical replacement
of (hopefully) all the llvm::make_unique instances across the monorepo.
llvm-svn: 369013
Summary: Fix "llvm-profdata show" so it can work with compact binary format profile. The change is to mark all functions "used" so SampleProfileReaderCompactBinary::read will read in all profiles available for dumping. The function names will be MD5 hash for compact binary format.
Reviewers: wmi, davidxl, danielcdh
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65162
llvm-svn: 368731
Summary:
This commit fixed a race condition from multi-threaded thinLTO backends that causes non-deterministic memory corruption for a data structure used only by AutoFDO with compact binary profile.
GUIDToFuncNameMap, a static data member of type DenseMap in FunctionSamples is used as a per-module mapping from function name MD5 to name string when input AutoFDO profile is in compact binary format. However with ThinLTO, we can have parallel backends modifying and accessing the class static map concurrently. The fix is to make GUIDToFuncNameMap a member of SampleProfileLoader instead of a file static data.
Reviewers: wmi, davidxl, danielcdh
Subscribers: mehdi_amini, inglorion, hiraditya, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65848
llvm-svn: 368596
Support loading code coverage data from regular archives, thin archives,
and from MachO universal binaries which contain archives.
Testing: check-llvm, check-profile (with {A,UB}San enabled)
rdar://51538999
Differential Revision: https://reviews.llvm.org/D63232
llvm-svn: 363325
Add overlap functionality to llvm-profdata tool to compute the similarity
between two profile files.
Differential Revision: https://reviews.llvm.org/D60977
llvm-svn: 359612