This patch optimizes two memory intrinsic operations: memset and memcpy based
on the profiled size of the operation. The high level transformation is like:
mem_op(..., size)
==>
switch (size) {
case s1:
mem_op(..., s1);
goto merge_bb;
case s2:
mem_op(..., s2);
goto merge_bb;
...
default:
mem_op(..., size);
goto merge_bb;
}
merge_bb:
Differential Revision: http://reviews.llvm.org/D28966
llvm-svn: 299446
This patch adds the value profile support to profile the size parameter of
memory intrinsic calls: memcpy, memcmp, and memmov.
Differential Revision: http://reviews.llvm.org/D28965
llvm-svn: 297897
Summary:
In SamplePGO, if the profile is collected from non-LTO binary, and used to drive ThinLTO, the indirect call promotion may fail because ThinLTO adjusts local function names to avoid conflicts. There are two places of where the mismatch can happen:
1. thin-link prepends SourceFileName to front of FuncName to build the GUID (GlobalValue::getGlobalIdentifier). Unlike instrumentation FDO, SamplePGO does not use the PGOFuncName scheme and therefore the indirect call target profile data contains a hash of the OriginalName.
2. backend compiler promotes some local functions to global and appends .llvm.{$ModuleHash} to the end of the FuncName to derive PromotedFunctionName
This patch tries at the best effort to find the GUID from the original local function name (in profile), and use that in ICP promotion, and in SamplePGO matching that happens in the backend after importing/inlining:
1. in thin-link, it builds the map from OriginalName to GUID so that when thin-link reads in indirect call target profile (represented by OriginalName), it knows which GUID to import.
2. in backend compiler, if sample profile reader cannot find a profile match for PromotedFunctionName, it will try to find if there is a match for OriginalFunctionName.
3. in backend compiler, we build symbol table entry for OriginalFunctionName and pointer to the same symbol of PromotedFunctionName, so that ICP can find the correct target to promote.
Reviewers: mehdi_amini, tejohnson
Reviewed By: tejohnson
Subscribers: llvm-commits, Prazek
Differential Revision: https://reviews.llvm.org/D30754
llvm-svn: 297757
Summary: We do not need that special handling because the debug info is more accurate now. Performance testing shows no regression on google internal benchmarks.
Reviewers: davidxl, aprantl
Reviewed By: aprantl
Subscribers: llvm-commits, aprantl
Differential Revision: https://reviews.llvm.org/D30658
llvm-svn: 297038
Summary:
Reset the ValueData for each function to avoid using the ones in
the previous function.
Reviewers: davidxl
Reviewed By: davidxl
Subscribers: llvm-commits, xur
Differential Revision: https://reviews.llvm.org/D30479
llvm-svn: 296916
Summary: For SamplePGO, the profile may contain cross-module inline stacks. As we need to make sure the profile annotation happens when all the hot inline stacks are expanded, we need to pass this info to the module importer so that it can import proper functions if necessary. This patch implemented this feature by emitting cross-module targets as part of function entry metadata. In the module-summary phase, the metadata is used to build call edges that points to functions need to be imported.
Reviewers: mehdi_amini, tejohnson
Reviewed By: tejohnson
Subscribers: davidxl, llvm-commits
Differential Revision: https://reviews.llvm.org/D30053
llvm-svn: 296498
This patch reverts r291588: [PGO] Turn off comdat renaming in IR PGO by default,
as we are seeing some hash mismatches in our internal tests.
llvm-svn: 291621
Summary:
In IR PGO we append the function hash to comdat functions to avoid the
potential hash mismatch. This turns out not legal in some cases: if the comdat
function is address-taken and used in comparison. Renaming changes the semantic.
This patch turns off comdat renaming by default.
To alleviate the hash mismatch issue, we now rename the profile variable
for comdat functions. Profile allows co-existing multiple versions of profiles
with different hash value. The inlined copy will always has the correct profile
counter. The out-of-line copy might not have the correct count. But we will
not have the bogus mismatch warning.
Reviewers: davidxl
Subscribers: llvm-commits, xur
Differential Revision: https://reviews.llvm.org/D28416
llvm-svn: 291588
Programs with very large __llvm_covmap sections may fail to link on
Darwin because because of out-of-range 32-bit RIP relative references.
It isn't possible to work around this by using the large code model
because it isn't supported on Darwin. One solution is to move the
__llvm_covmap section past the end of the __DATA segment.
=== Testing ===
In addition to check-{llvm,clang,profile}, I performed a link test on a
simple object after injecting ~4GB of padding into __llvm_covmap:
@__llvm_coverage_padding = internal constant [4000000000 x i8] zeroinitializer, section "__LLVM_COV,__llvm_covmap", align 8
(This test is too expensive to check-in.)
=== Backwards Compatibility ===
This patch should not pose any backwards-compatibility concerns. LLVM
is expected to scan all of the sections in a binary for __llvm_covmap,
so changing its segment shouldn't affect anything. I double-checked this
by loading coverage produced by an unpatched compiler with a patched
llvm-cov.
Suggested by Nick Kledzik.
llvm-svn: 285360
All of these existed because MSVC 2013 was unable to synthesize default
move ctors. We recently dropped support for it so all that error-prone
boilerplate can go.
No functionality change intended.
llvm-svn: 284721
Profile runtime can generate an empty raw profile (when there is no function in
the shared library). This empty profile is treated as a text format profile. A
test format profile without the flag of "#IR" is thought to be a clang
generated profile. So in llvm profile merging, we will get a bogus warning of
"Merge IR generated profile with Clang generated profile."
The fix here is to skip the empty profile (when the buffer size is 0) for
profile merge.
Reviewers: vsk, davidxl
Differential Revision: http://reviews.llvm.org/D25687
llvm-svn: 284659
Add support for loading multiple coverage readers into a single
CoverageMapping instance. This should make it easier to prepare a
unified coverage report for multiple binaries.
Differential Revision: https://reviews.llvm.org/D25535
llvm-svn: 284251
On Darwin, marking a section as "regular,live_support" means that a
symbol in the section should only be kept live if it has a reference to
something that is live. Otherwise, the linker is free to dead-strip it.
Turn this functionality on for the __llvm_prf_data section.
This means that counters and data associated with dead functions will be
removed from dead-stripped binaries. This will result in smaller
profiles and binaries, and should speed up profile collection.
Tested with check-profile, llvm-lit test/tools/llvm-{cov,profdata}, and
check-llvm.
Differential Revision: https://reviews.llvm.org/D25456
llvm-svn: 283947
Summary: The call target count profile is directly derived from LBR branch->target data. This is more reliable than instruction frequency profiles that could be moved across basic block boundaries. This patches uses call target count profile to annotate call instructions.
Reviewers: davidxl, dnovillo
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D24410
llvm-svn: 281911
Move the comparison function into the only place there it is used,
i.e. the call to std::stable_sort in CoverageMappingWriter::write().
Add sorting by region kinds as it is required to ensure stable order
in our tests and to simplify D23987.
Differential Revision: https://reviews.llvm.org/D24034
llvm-svn: 280198
Move needsComdatForCounter() to lib/ProfileData/InstrProf.cpp from
lib/Transforms/Instrumentation/InstrProfiling.cpp to make is available for
other files.
Differential Revision: https://reviews.llvm.org/D22643
llvm-svn: 276330
Add a "-j" option to llvm-profdata to control the number of threads used.
Auto-detect NumThreads when it isn't specified, and avoid spawning threads when
they wouldn't be beneficial.
I tested this patch using a raw profile produced by clang (147MB). Here is the
time taken to merge 4 copies together on my laptop:
No thread pool: 112.87s user 5.92s system 97% cpu 2:01.08 total
With 2 threads: 134.99s user 26.54s system 164% cpu 1:33.31 total
Changes since the initial commit:
- When handling odd-length inputs, call ThreadPool::wait() before merging the
last profile. Should fix a race/off-by-one (see r275937).
Differential Revision: https://reviews.llvm.org/D22438
llvm-svn: 275938
Add a "-j" option to llvm-profdata to control the number of threads
used. Auto-detect NumThreads when it isn't specified, and avoid spawning
threads when they wouldn't be beneficial.
I tested this patch using a raw profile produced by clang (147MB). Here is the
time taken to merge 4 copies together on my laptop:
No thread pool: 112.87s user 5.92s system 97% cpu 2:01.08 total
With 2 threads: 134.99s user 26.54s system 164% cpu 1:33.31 total
Differential Revision: https://reviews.llvm.org/D22438
llvm-svn: 275921
This reverts commit 520a8298d8ef676b5da617ba3d2c7fa37381e939 (r273055).
This is breaking stage2 instrumented builds with "malformed coverage
data" errors.
llvm-svn: 274106
Pass a `MemoryBuffer &` to BinaryCoverageReader::create() instead of a
`std::unique_ptr<MemoryBuffer> &`. This makes it easier to reason about
the ownership of the buffer at a glance.
llvm-svn: 273326
Currently, frontends which emit source-based code coverage have to
duplicate logic to encode filenames and raw coverage mappings properly.
This violates an abstraction layer and forces frontends to copy tricky
code.
Introduce llvm::coverage::encodeFilenamesAndRawMappings() to take care
of this.
This will help us experiment with zlib-compressing coverage mapping
data.
llvm-svn: 273055