llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-26 04:32:44 +01:00

Author	SHA1	Message	Date
Kazu Hirata	6b0d4e7fd3	[llvm] Use *Map::lookup (NFC)	2021-01-01 12:44:54 -08:00
Simon Pilgrim	1c482de06b	[SampleFDO] Fix uninitialized field warnings. NFCI. Seems to have been caused by D93254 which added the SecHdrTableEntry::LayoutIndex field.	2020-12-17 15:51:26 +00:00
Barry Revzin	2fc9f32ca3	Make LLVM build in C++20 mode Part of the <=> changes in C++20 make certain patterns of writing equality operators ambiguous with themselves (sorry!). This patch goes through and adjusts all the comparison operators such that they should work in both C++17 and C++20 modes. It also makes two other small C++20-specific changes (adding a constructor to a type that cases to be an aggregate, and adding casts from u8 literals which no longer have type const char*). There were four categories of errors that this review fixes. Here are canonical examples of them, ordered from most to least common: // 1) Missing const namespace missing_const { struct A { #ifndef FIXED bool operator==(A const&); #else bool operator==(A const&) const; #endif }; bool a = A{} == A{}; // error } // 2) Type mismatch on CRTP namespace crtp_mismatch { template <typename Derived> struct Base { #ifndef FIXED bool operator==(Derived const&) const; #else // in one case changed to taking Base const& friend bool operator==(Derived const&, Derived const&); #endif }; struct D : Base<D> { }; bool b = D{} == D{}; // error } // 3) iterator/const_iterator with only mixed comparison namespace iter_const_iter { template <bool Const> struct iterator { using const_iterator = iterator<true>; iterator(); template <bool B, std::enable_if_t<(Const && !B), int> = 0> iterator(iterator<B> const&); #ifndef FIXED bool operator==(const_iterator const&) const; #else friend bool operator==(iterator const&, iterator const&); #endif }; bool c = iterator<false>{} == iterator<false>{} // error \|\| iterator<false>{} == iterator<true>{} \|\| iterator<true>{} == iterator<false>{} \|\| iterator<true>{} == iterator<true>{}; } // 4) Same-type comparison but only have mixed-type operator namespace ambiguous_choice { enum Color { Red }; struct C { C(); C(Color); operator Color() const; bool operator==(Color) const; friend bool operator==(C, C); }; bool c = C{} == C{}; // error bool d = C{} == Red; } Differential revision: https://reviews.llvm.org/D78938	2020-12-17 10:44:10 +00:00
Wei Mi	74da8abf7f	[NFC][SampleFDO] Preparation to support multiple sections with the same type in ExtBinary format. Currently ExtBinary format doesn't support multiple sections with the same type in the profile. We add the support in this patch. Previously we use the section type to identify a section uniquely. Now we introduces a LayoutIndex in the SecHdrTableEntry and use the LayoutIndex to locate the target section. The allocations of NameTable and FuncOffsetTable are adjusted accordingly. Currently it works as a NFC because it won't change anything for current layout. The test for multiple sections support will be included in another patch where a new type of profile containing multiple sections with the same type is introduced. Differential Revision: https://reviews.llvm.org/D93254	2020-12-16 22:28:45 -08:00
Hongtao Yu	66121fdf6a	[CSSPGO] Consume pseudo-probe-based AutoFDO profile This change enables pseudo-probe-based sample counts to be consumed by the sample profile loader under the regular `-fprofile-sample-use` switch with minimal adjustments to the existing sample file formats. After the counts are imported, a probe helper, aka, a `PseudoProbeManager` object, is automatically launched to verify the CFG checksum of every function in the current compilation against the corresponding checksum from the profile. Mismatched checksums will cause a function profile to be slipped. A `SampleProfileProber` pass is scheduled before any of the `SampleProfileLoader` instances so that the CFG checksums as well as probe mappings are available during the profile loading time. The `PseudoProbeManager` object is set up right after the profile reading is done. In the future a CFG-based fuzzy matching could be done in `PseudoProbeManager`. Samples will be applied only to pseudo probe instructions as well as probed callsites once the checksum verification goes through. Those instructions are processed in the same way that regular instructions would be processed in the line-number-based scenario. In other words, a function is processed in a regular way as if it was reduced to just containing pseudo probes (block probes and callsites). Adjustment to profile format A CFG checksum field is being added to the existing AutoFDO profile formats. So far only the text format and the extended binary format are supported. For the text format, a new line like ``` !CFGChecksum: 12345 ``` is added to the end of the body sample lines. For the extended binary profile format, we introduce a metadata section to store the checksum map from function names to their CFG checksums. Differential Revision: https://reviews.llvm.org/D92347	2020-12-16 15:57:18 -08:00
Fangrui Song	506848f563	[llvm-cov gcov] Replace Donald B. Johnson's cycle enumeration with iterative cycle finding gcov computes the line execution count as the sum of (a) counts from predecessors on other lines and (b) the sum of loop execution counts of blocks on the same line (think of loops on one line). For (b), we use Donald B. Johnson's cycle enumeration algorithm and perform cycle cancelling for each cycle. This number of candidate cycles were exponential and D93036 made it polynomial by skipping zero count cycles. The time complexity is high (O(VE^2) (it could be O(E^2) but the linear `Blocks` check made it higher) and the implementation is complex. We could just identify loops and sum all back edges. However, this requires a dominator tree construction which is more complex. The time complexity can be decreased to almost linear, though. This patch just performs cycle cancelling iteratively. Add two members `traversable` and `incoming` to GCOVArc. There are 3 states: `!traversable`: blocks not on this line or explored blocks * `traversable && incoming == nullptr`: unexplored blocks * `traversable && incoming != nullptr`: blocks which are being explored (on the stack) If an arc points to a block being explored, a cycle has been found. Let E be the number of arcs. Every time a cycle is found, at least one arc is saturated (`edgeCount` reduced to 0), so there are at most E cycles. Finding one cycle takes O(E) time, so the overall time complexity is O(E^2). Note that we always augment through a back edge and never need to augment its reverse edge so reverse edges in traditional flow networks are not needed. Reviewed By: xinhaoyuan Differential Revision: https://reviews.llvm.org/D93073	2020-12-11 18:28:16 -08:00
Xinhao Yuan	38310b229d	[llvm-cov][gcov] Optimize the cycle counting algorithm by skipping zero count cycles This change is similar to http://gcc.gnu.org/PR90380 This reduces the complexity from exponential to polynomial of the arcs. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D93036	2020-12-10 15:22:29 -08:00
Wei Mi	2b916be8fe	[SampleFDO] Store fixed length MD5 in NameTable instead of using ULEB128 if MD5 is used. Currently during sample profile loading, NameTable has to be loaded entirely up front before any name string is retrieved. That is because NameTable is stored using ULEB128 encoding and cannot be directly accessed like an array. However, if MD5 is used to represent name in the NameTable, it has fixed length. If MD5 names are stored in uint64_t type instead of ULEB128, NameTable can be accessed like an array then in many cases only part of the NameTable has to be read. This is helpful for reducing compile time especially when small source file is compiled. We find that after this change, the elapsed time to build a large application distributively is reduced by 5% and the accumulative cpu time used for building is also reduced by 5%. The size of the profile is slightly reduced with this change by ~0.2%, and that also indicates encoding MD5 in ULEB128 doesn't save the storage space. Differential Revision: https://reviews.llvm.org/D92621	2020-12-08 16:21:01 -08:00
Pan, Tao	5725c8779a	[CodeGen] Add text section prefix for COFF object file Text section prefix is created in CodeGenPrepare, it's file format independent implementation, text section name is written into object file in TargetLoweringObjectFile, it's file format dependent implementation, port code of adding text section prefix to text section name from ELF to COFF. Different with ELF that use '.' as concatenation character, COFF use '$' as concatenation character. That is, concatenation character is variable, so split concatenation character from text section prefix. Text section prefix is existing feature of ELF, it can help to reduce icache and itlb misses, it's also make possible aggregate other compilers e.g. v8 created same prefix sections. Furthermore, the recent feature Machine Function Splitter (basic block level text prefix section) is based on text section prefix. Reviewed By: pengfei, rnk Differential Revision: https://reviews.llvm.org/D92073	2020-12-08 18:56:21 +08:00
wlei	db7fa377e4	[CSSPGO][llvm-profgen] Context-sensitive profile data generation This stack of changes introduces `llvm-profgen` utility which generates a profile data file from given perf script data files for sample-based PGO. It’s part of(not only) the CSSPGO work. Specifically to support context-sensitive with/without pseudo probe profile, it implements a series of functionalities including perf trace parsing, instruction symbolization, LBR stack/call frame stack unwinding, pseudo probe decoding, etc. Also high throughput is achieved by multiple levels of sample aggregation and compatible format with one stop is generated at the end. Please refer to: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s for the CSSPGO RFC. This change supports context-sensitive profile data generation into llvm-profgen. With simultaneous sampling for LBR and call stack, we can identify leaf of LBR sample with calling context from stack sample . During the process of deriving fall through path from LBR entries, we unwind LBR by replaying all the calls and returns (including implicit calls/returns due to inlining) backwards on top of the sampled call stack. Then the state of call stack as we unwind through LBR always represents the calling context of current fall through path. we have two types of virtual unwinding 1) LBR unwinding and 2) linear range unwinding. Specifically, for each LBR entry which can be classified into call, return, regular branch, LBR unwinding will replay the operation by pushing, popping or switching leaf frame towards the call stack and since the initial call stack is most recently sampled, the replay should be in anti-execution order, i.e. for the regular case, pop the call stack when LBR is call, push frame on call stack when LBR is return. After each LBR processed, it also needs to align with the next LBR by going through instructions from previous LBR's target to current LBR's source, which we named linear unwinding. As instruction from linear range can come from different function by inlining, linear unwinding will do the range splitting and record counters through the range with same inline context. With each fall through path from LBR unwinding, we aggregate each sample into counters by the calling context and eventually generate full context sensitive profile (without relying on inlining) to driver compiler's PGO/FDO. A breakdown of noteworthy changes: - Added `HybridSample` class as the abstraction perf sample including LBR stack and call stack * Extended `PerfReader` to implement auto-detect whether input perf script output contains CS profile, then do the parsing. Multiple `HybridSample` are extracted * Speed up by aggregating `HybridSample` into `AggregatedSamples` * Added VirtualUnwinder that consumes aggregated `HybridSample` and implements unwinding of calls, returns, and linear path that contains implicit call/return from inlining. Ranges and branches counters are aggregated by the calling context.  Here calling context is string type, each context is a pair of function name and callsite location info, the whole context is like `main:1 @ foo:2 @ bar`. * Added PorfileGenerater that accumulates counters by ranges unfolding or branch target mapping, then generates context-sensitive function profile including function body, inferring callee's head sample, callsite target samples, eventually records into ProfileMap.  * Leveraged LLVM build-in(`SampleProfWriter`) writer to support different serialization format with no stop - `getCanonicalFnName` for callee name and name from ELF section - Added regression test for both unwinding and profile generation Test Plan: ninja & ninja check-llvm Reviewed By: hoy, wenlei, wmi Differential Revision: https://reviews.llvm.org/D89723	2020-12-07 13:48:58 -08:00
Wenlei He	6ab8756fe0	[CSSPGO] Infrastructure for context-sensitive Sample PGO and Inlining This change adds the context-senstive sample PGO infracture described in CSSPGO RFC (https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s). It introduced an abstraction between input profile and profile loader that queries input profile for functions. Specifically, there's now the notion of base profile and context profile, and they are managed by the new SampleContextTracker for adjusting and merging profiles based on inline decisions. It works with top-down profiled guided inliner in profile loader (https://reviews.llvm.org/D70655) for better inlining with specialization and better post-inline profile fidelity. In the future, we can also expose this infrastructure to CGSCC inliner in order for it to take advantage of context-sensitive profile. This change is the consumption part of context-sensitive profile (The generation part is in this stack: https://reviews.llvm.org/D89707). We've seen good results internally in conjunction with Pseudo-probe (https://reviews.llvm.org/D86193). Pacthes for integration with Pseudo-probe coming up soon. Currently the new infrastructure kick in when input profile contains the new context-sensitive profile; otherwise it's no-op and does not affect existing AutoFDO. Interface There're two sets of interfaces for query and tracking respectively exposed from SampleContextTracker. For query, now instead of simply getting a profile from input for a function, we can explicitly query base profile or context profile for given call path of a function. For tracking, there're separate APIs for marking context profile as inlined, or promoting and merging not inlined context profile. - Query base profile (`getBaseSamplesFor`) Base profile is the merged synthetic profile for function's CFG profile from any outstanding (not inlined) context. We can query base profile by function. - Query context profile (`getContextSamplesFor`) Context profile is a function's CFG profile for a given calling context. We can query context profile by context string. - Track inlined context profile (`markContextSamplesInlined`) When a function is inlined for given calling context, we need to mark the context profile for that context as inlined. This is to make sure we don't include inlined context profile when synthesizing base profile for that inlined function. - Track not-inlined context profile (`promoteMergeContextSamplesTree`) When a function is not inlined for given calling context, we need to promote the context profile tree so the not inlined context becomes top-level context. This preserve the sub-context under that function so later inline decision for that not inlined function will still have context profile for its call tree. Note that profile will be merged if needed when promoting a context profile tree if any of the node already exists at its promoted destination. Implementation Implementation-wise, `SampleContext` is created as abstraction for context. Currently it's a string for call path, and we can later optimize it to something more efficient, e.g. context id. Each `SampleContext` also has a `ContextState` indicating whether it's raw context profile from input, whether it's inlined or merged, whether it's synthetic profile created by compiler. Each `FunctionSamples` now has a `SampleContext` that tells whether it's base profile or context profile, and for context profile what is the context and state. On top of the above context representation, a custom trie tree is implemented to track and manager context profiles. Specifically, `SampleContextTracker` is implemented that encapsulates a trie tree with `ContextTireNode` as node. Each node of the trie tree represents a frame in calling context, thus the path from root to a node represents a valid calling context. We also track `FunctionSamples` for each node, so this trie tree can serve efficient query for context profile. Accordingly, context profile tree promotion now becomes moving a subtree to be under the root of entire tree, and merge nodes for subtree if this move encounters existing nodes. Integration `SampleContextTracker` is now also integrated with AutoFDO, `SampleProfileReader` and `SampleProfileLoader`. When we detected input profile contains context-sensitive profile, `SampleContextTracker` will be used to track profiles, and all profile query will go to `SampleContextTracker` instead of `SampleProfileReader` automatically. Tracking APIs are called automatically for each inline decision from `SampleProfileLoader`. Differential Revision: https://reviews.llvm.org/D90125	2020-12-06 11:49:18 -08:00
Wei Mi	9479e29c3c	[NFC][SampleFDO] Move some common stuff from SampleProfileReaderExtBinary/WriterExtBinary to their parent classes. SampleProfileReaderExtBinary/SampleProfileWriterExtBinary specify the typical section layout currently used by SampleFDO. Currently a lot of section reader/writer stay in the two classes. However, as we expect to have more types of SampleFDO profiles, we hope those new types of profiles can share the common sections while configuring their own sections easily with minimal change. That is why I move some common stuff from SampleProfileReaderExtBinary/SampleProfileWriterExtBinary to SampleProfileReaderExtBinaryBase/SampleProfileWriterExtBinaryBase so new profiles class inheriting from the base class can reuse them. Differential Revision: https://reviews.llvm.org/D89524	2020-10-22 15:56:55 -07:00
Hiroshi Yamauchi	7e9ad11889	[PGO] Remove the old memop value profiling buckets. Following up D81682 and D83903, remove the code for the old value profiling buckets, which have been replaced with the new, extended buckets and disabled by default. Also syncing InstrProfData.inc between compiler-rt and llvm. Differential Revision: https://reviews.llvm.org/D88838	2020-10-15 10:09:49 -07:00
Vedant Kumar	267d2a2041	[llvm-cov] Warn when -arch spec is missing/invalid for universal binary (reland) llvm-cov reports a poor error message when the -arch specifier is missing or invalid, and a binary has multiple slices. Make the error message more specific. (This version of the patch avoids using llvm::none_of -- the way I used the utility caused compile errors on many bots, possibly because the wrong overload of `none_of` was selected.) rdar://40312677	2020-10-13 16:46:03 -07:00
Vedant Kumar	65259aae54	Revert "[llvm-cov] Warn when -arch spec is missing/invalid for universal binary" This reverts commit b81d4bfb44c14575130bb06c047728b69c3213aa. It's causing some bots to fail to build due to: "error: no matching function for call to ‘__iterator_category".	2020-10-13 16:32:31 -07:00
Vedant Kumar	40fbc09049	[llvm-cov] Warn when -arch spec is missing/invalid for universal binary llvm-cov reports a poor error message when the -arch specifier is missing or invalid, and a binary has multiple slices. Make the error message more specific. rdar://40312677	2020-10-13 16:29:26 -07:00
Simon Pilgrim	bb635e356b	Remove unnecessary forward declarations. NFCI. All of these forward declarations are fully defined in headers that are directly included.	2020-09-17 13:31:52 +01:00
Fangrui Song	1bd4869627	[llvm-cov gcov] Add --demangled-names (-m) gcov 4.9 introduced the option.	2020-09-16 23:18:50 -07:00
Fangrui Song	c8bd947872	[llvm-cov gcov] Refactor counting and reporting The current organization of FileInfo and its referenced utility functions of (GCOVFile, GCOVFunction, GCOVBlock) is messy. Some members of FileInfo are just copied from GCOVFile. FileInfo::print (.gcov output and --intermediate output) is interleaved with branch statistics and computation of line execution counts. --intermediate has to do redundant .gcov output to gather branch statistics. This patch deletes lots of code and introduces a clearer work flow: ``` fn collectFunction for each block b for each line lineNum let line be LineInfo of the file on lineNum line.exists = 1 increment function's lines & linesExec if necessary increment line.count line.blocks.push_back(&b) fn collectSourceLine compute cycle counts count = incoming_counts + cycle_counts if line.exists ++summary->lines if line.count ++summary->linesExec fn collectSource for each line call collectSourceLine fn main for each function call collectFunction print function summary for each source file call collectSource print file summary annotate the source file with line execution counts if -i print intermediate file ``` The output order of functions and files now follows the original order in .gcno files.	2020-09-13 23:00:59 -07:00
Fangrui Song	0637c5d6a0	[llvm-cov gcov] Add -r (--relative-only) && -s (--source-prefix) gcov 4.7 introduced the two options. https://sourceware.org/pipermail/gcc-patches/2011-November/328782.html -r only dumps files with relative paths or absolute paths with the prefix specified by -s. The two options are useful filtering out system header files.	2020-09-13 14:54:20 -07:00
Fangrui Song	f6898867d5	[llvm-cov gcov] Improve accuracy when some edges are not measured Also guard against infinite recursion if GCOV_ARC_ON_TREE edges contain a cycle.	2020-09-12 22:33:41 -07:00
Fangrui Song	789e9aff28	[llvm-cov gcov] Compute unmeasured arc counts by Kirchhoff's circuit law For a CFG G=(V,E), Knuth describes that by Kirchoff's circuit law, the minimum number of counters necessary is \|E\|-(\|V\|-1). The emitted edges form a spanning tree. libgcov emitted .gcda files leverages this optimization while clang --coverage's doesn't. Propagate counts by Kirchhoff's circuit law so that llvm-cov gcov can correctly print line counts of gcc --coverage emitted files and enable the future improvement of clang --coverage.	2020-09-08 18:45:11 -07:00
Wei Mi	d9e172d0fb	[SampleFDO] Enhance profile remapping support for searching inline instance and indirect call promotion candidate. Profile remapping is a feature to match a function in the module with its profile in sample profile if the function name and the name in profile look different but are equivalent using given remapping rules. This is a useful feature to keep the performance stable by specifying some remapping rules when sampleFDO targets are going through some large scale function signature change. However, currently profile remapping support is only valid for outline function profile in SampleFDO. It cannot match a callee with an inline instance profile if they have different but equivalent names. We found that without the support for inline instance profile, remapping is less effective for some large scale change. To add that support, before any remapping lookup happens, all the names in the profile will be inserted into remapper and the Key to the name mapping will be recorded in a map called NameMap in the remapper. During name lookup, a Key will be returned for the given name and it will be used to extract an equivalent name in the profile from NameMap. So with the help of the NameMap, we can translate any given name to an equivalent name in the profile if it exists. Whenever we try to match a name in the module to a name in the profile, we will try the match with the original name first, and if it doesn't match, we will use the equivalent name got from remapper to try the match for another time. In this way, the patch can enhance the profile remapping support for searching inline instance and searching indirect call promotion candidate. In a planned large scale change of int64 type (long long) to int64_t (long), we found the performance of a google internal benchmark degraded by 2% if nothing was done. If existing profile remapping was enabled, the performance degradation dropped to 1.2%. If the profile remapping with the current patch was enabled, the performance degradation further dropped to 0.14% (Note the experiment was done before searching indirect call promotion candidate was added. We hope with the remapping support of searching indirect call promotion candidate, the degradation can drop to 0% in the end. It will be evaluated post commit). Differential Revision: https://reviews.llvm.org/D86332	2020-08-26 11:07:35 -07:00
Hiroshi Yamauchi	0b0a5993c1	[PGO] Extend the value profile buckets for mem op sizes. Extend the memop value profile buckets to be more flexible (could accommodate a mix of individual values and ranges) and to cover more value ranges (from 11 to 22 buckets). Disabled behind a flag (to be enabled separately) and the existing code to be removed later. Differential Revision: https://reviews.llvm.org/D81682	2020-08-03 11:04:32 -07:00
Wei Mi	b565123367	Fix a crash when the sample profile uses md5 and -sample-profile-merge-inlinee is enabled. When -sample-profile-merge-inlinee is enabled, new FunctionSamples may be created during profile merge without GUIDToFuncNameMap being initialized. That will occasionally cause compiler crash. The patch fixes it. Differential Revision: https://reviews.llvm.org/D84994	2020-07-30 21:21:06 -07:00
Wei Mi	51d4708437	Supplement instr profile with sample profile. PGO profile is usually more precise than sample profile. However, PGO profile needs to be collected from loadtest and loadtest may not be representative enough to the production workload. Sample profile collected from production can be used as a supplement -- for functions cold in loadtest but warm/hot in production, we can scale up the related function in PGO profile if the function is warm or hot in sample profile. The implementation contains changes in compiler side and llvm-profdata side. Given an instr profile and a sample profile, for a function cold in PGO profile but warm/hot in sample profile, llvm-profdata will either mark all the counters in the profile to be -1 or scale up the max count in the function to be above hot threshold, depending on the zero counter ratio in the profile. The assumption is if there are too many counters being zero in the function profile, the profile is more likely to cause harm than good, then llvm-profdata will mark all the counters to be -1 indicating the function is hot but the profile is unaccountable. In compiler side, if a function profile with all -1 counters is seen, the function entry count will be set to be above hot threshold but its internal profile will be dropped. In the long run, it may be useful to let compiler support using PGO profile and sample profile at the same time, but that requires more careful design and more substantial changes to make two profiles work seamlessly. The patch here serves as a simple intermediate solution. Differential Revision: https://reviews.llvm.org/D81981	2020-07-27 20:17:40 -07:00
Fangrui Song	4e9b56ee13	Revert D81682 "[PGO] Extend the value profile buckets for mem op sizes." This reverts commit 4a539faf74b9b4c25ee3b880e4007564bd5139b0. There is a __llvm_profile_instrument_range related crash in PGO-instrumented clang: ``` (gdb) bt llvm::ConstantRange const&, llvm::APInt const&, unsigned int, bool) () llvm::ScalarEvolution::getRangeForAffineAR(llvm::SCEV const, llvm::SCEV const, llvm::SCEV const*, unsigned int) () ``` (The body of __llvm_profile_instrument_range is inlined, so we can only find__llvm_profile_instrument_target in the trace) ``` 23│ 0x000055555dba0961 <+65>: nopw %cs:0x0(%rax,%rax,1) 24│ 0x000055555dba096b <+75>: nopl 0x0(%rax,%rax,1) 25│ 0x000055555dba0970 <+80>: mov %rsi,%rbx 26│ 0x000055555dba0973 <+83>: mov 0x8(%rsi),%rsi # %rsi=-1 -> SIGSEGV 27│ 0x000055555dba0977 <+87>: cmp %r15,(%rbx) 28│ 0x000055555dba097a <+90>: je 0x55555dba0a76 <__llvm_profile_instrument_target+342> ```	2020-07-22 16:08:25 -07:00
Rong Xu	005085c634	[PGO] Supporting code for always instrumenting entry block This patch includes the supporting code that enables always instrumenting the function entry block by default. This patch will NOT the default behavior. It adds a variant bit in the profile version, adds new directives in text profile format, and changes llvm-profdata tool accordingly. This patch is a split of D83024 (https://reviews.llvm.org/D83024) Many test changes from D83024 are also included. Differential Revision: https://reviews.llvm.org/D84261	2020-07-22 15:01:53 -07:00
Hiroshi Yamauchi	a85cda4f5a	[PGO] Extend the value profile buckets for mem op sizes. Extend the memop value profile buckets to be more flexible (could accommodate a mix of individual values and ranges) and to cover more value ranges (from 11 to 22 buckets). Disabled behind a flag (to be enabled separately) and the existing code to be removed later.	2020-07-15 10:26:15 -07:00
Wei Mi	7fc0e8b3ed	[NFC] Change getEntryForPercentile to be a static function in ProfileSummaryBuilder. Change file static function getEntryForPercentile to be a static member function in ProfileSummaryBuilder so it can be used by other files. Differential Revision: https://reviews.llvm.org/D83439	2020-07-09 16:38:19 -07:00
Hiroshi Yamauchi	b3de353064	Revert "[PGO] Extend the value profile buckets for mem op sizes." This reverts commit 63a89693f09f6b24ce4f2350d828150bd9c4f3e8. Due to a build failure like http://lab.llvm.org:8011/builders/sanitizer-windows/builds/65386/steps/annotate/logs/stdio	2020-06-25 11:13:49 -07:00
Hiroshi Yamauchi	754259b7af	[PGO] Extend the value profile buckets for mem op sizes. Extend the memop value profile buckets to be more flexible (could accommodate a mix of individual values and ranges) and to cover more value ranges (from 11 to 22 buckets). Disabled behind a flag (to be enabled separately) and the existing code to be removed later. Differential Revision: https://reviews.llvm.org/D81682	2020-06-25 10:22:56 -07:00
Fangrui Song	b24b7c4c54	[llvm-profdata] --hot-func-list: fix some style issues in D81800 Reviewed By: wenlei, hoyFB Differential Revision: https://reviews.llvm.org/D82500	2020-06-24 15:17:03 -07:00
weihe	4f72d6501a	Add --hot-func-list to llvm-profdata show for sample profiles Summary: Add the --hot-func-list feature to llvm-profdata show for sample profiles. This feature prints a list of hot functions whose max sample count are above the 99% threshold, with their numbers of total samples, total samples percentage, max samples, entry samples, and their function names. Test Plan: Reviewers: wenlei, hoyFB Reviewed By: wenlei, hoyFB Subscribers: hoyFB, wenlei, weihe, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82355	2020-06-24 12:49:46 -07:00
Bruno Ricci	3b131a7452	Revert "Add --hot-func-list to llvm-profdata show for sample profiles" This reverts commit 7348b951fe74f306970f6ac567fe5dddbb1c42d4. It is causing Asan failures.	2020-06-21 14:33:08 +01:00
weihe	e6a186e4ac	Add --hot-func-list to llvm-profdata show for sample profiles Summary: Add the --hot-func-list feature to llvm-profdata show for sample profiles. This feature prints a list of hot functions whose max sample count are above the 99% threshold, with their numbers of total samples, total samples percentage, max samples, entry samples, and their function names. Reviewers: wmi, hoyFB, wenlei Reviewed By: wmi Subscribers: hoyFB, wenlei, llvm-commits, weihe Tags: #llvm Differential Revision: https://reviews.llvm.org/D81800	2020-06-20 10:13:36 -07:00
Fangrui Song	3ac41a5d21	[llvm-cov gcov] Don't suppress .gcov output if .gcda is corrupted If .gcda is corrupted, gcov continues to produce a .gcov and just assumes execution counts are zeros. This is reasonable, because the program can corrupt its .gcda output. The code path should be similar to the code path without .gcda.	2020-06-16 14:55:38 -07:00
Fangrui Song	d1d0909fd1	[gcov] Add -i --intermediate-format Between gcov 4.9~8, `gcov -i $file` prints coverage information to $file.gcov in an intermediate text format (single file, instead of $source.gcov for each source file). lcov newer than 2019-05-24 detects -i support and uses it to increase processing speed. gcov 9 (GCC r265587) removed --intermediate-format and -i was changed to mean --json-format. However, we consider this format still useful and support it. geninfo (part of lcov) supports this format even if we announce that we are compatible with gcov 9.0.0	2020-06-16 14:14:28 -07:00
Fangrui Song	69837fa7b5	[gcov] Refactor llvm-cov gcov and add SourceInfo	2020-06-16 14:14:26 -07:00
Fangrui Song	d6e9f4ed95	[llvm-cov] Fix gcov version detection on big-endian	2020-06-07 08:07:32 -07:00
Fangrui Song	a7a8160485	[gcov] Improve tests and lower the minimum supported version to gcov 3.4 global-ctor.ll no longer checks what it intended to check (@_GLOBAL__sub_I_global-ctor.ll needs a !dbg to work). Rewrite it. gcov 3.4 and gcov 4.2 use the same format, thus we can lower the version requirement to 3.4	2020-06-06 23:11:32 -07:00
Fangrui Song	9b29d98c47	[gcov] Improve .gcno compatibility with gcov and use DataExtractor llvm-cov.test and many Inputs/test* files contain wrong tests. This patch rewrites a large portion of these files. The pre-canned .gcno & .gcda are replaced by binaries produced by clang --coverage (compatible with gcov 4.8~7) (after some GCDAProfiling.c bugs were fixed by my previous commits). Also make llvm-cov gcov on a little-endian host capable to parse big-endian .gcno and .gcda, and make llvm-cov gcov on big-endian host capable to parse little-endian .gcno and .gcda	2020-06-03 19:29:21 -07:00
serge-sans-paille	e92c41e0e9	[PGO] Fix computation of function Hash And bump its version number accordingly. This is a patched recommit of 7c298c104bfe725d4315926a656263e8a5ac3054 Previous hash implementation was incorrectly passing an uint64_t, that got converted to an uint8_t, to finalize the hash computation. This led to different functions having the same hash if they only differ by the remaining statements, which is incorrect. Added a new test case that trivially tests that a small function change is reflected in the hash value. Not that as this patch fixes the hash computation, it would invalidate all hashes computed before that patch applies, this is why we bumped the version number. Update profile data hash entries due to hash function update, except for binary version, in which case we keep the buggy behavior for backward compatibility. Differential Revision: https://reviews.llvm.org/D79961	2020-05-27 09:15:21 +02:00
Fangrui Song	595401e441	[gcov] Default coverage version to '408' and delete CC1 option -coverage-exit-block-before-body gcov 4.8 (r189778) moved the exit block from the last to the second. The .gcda format is compatible with 4.7 but decoding libgcov 4.7 produced .gcda with gcov [4.7,8) can mistake the exit block, emit bogus `%s:'%s' has arcs from exit block\n` warnings, and print wrong `" returned %s` for branch statistics (-b). * decoding libgcov 4.8 produced .gcda with gcov 4.7 has similar issues. Also, rename "return block" to "exit block" because the latter is the appropriate term.	2020-05-12 09:14:03 -07:00
Hongtao Yu	9e3a860968	Properly add out-of-module functions to the import list This patch addresses two issues related to adding inline functions to the import list while recursively going through the profiling data. 1. For callsite samples, only add an inlined function to the import list if it's from outside of the module (i.e. only has a declaration inside the module). 2. For body samples, add each target function to the import list if it's from outside of the module (i.e. only has a declaration inside the module). Previously we were using getSubProgram() to check whether it has dbg info, which is inaccurate. This fix properly add imports and could improve the quality of the pass. Added a few changes to the test to catch these cases. Differential Revision: https://reviews.llvm.org/D79379	2020-05-11 10:00:14 -07:00
Xun Li	01dc1f624a	Remove an unused Module param Summary: In D65848 the function getFuncNameInModule was refactored to no longer use module. This diff removes the parameter and rename the function name to avoid confusion. Reviewers: wenlei, wmi, davidxl Reviewed By: wenlei Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79310	2020-05-10 22:09:55 -07:00
Fangrui Song	5ac915ed2c	[gcov] Implement --stdout -t gcov by default prints to a .gcov file. With --stdout, stdout is used. Some summary information is omitted. There is no separator for multiple source files.	2020-05-10 21:02:38 -07:00
Fangrui Song	65e4dbf091	[gcov] Don't skip leading zeros when reading a string Even a 2003 version of gcov_read_string does not have the behavior described by rL204881. The wrong impression might come from libclang_rt.profile (GCDAProfiling.c)'s corrupted output (since the initial import rL144865). Note, the corrupted output crashes gcov.	2020-05-10 10:16:34 -07:00
Fangrui Song	d7b402943b	[gcov] Fix .gcda decoding and support GCC 8, 9 and 10 GCDAProfiling.c unnecessarily writes function names to .gcda files. GCC 4.2 gcc/libgcov.c (now renamed to libgcc/libgcov*) did not write function names. gcov-7 (compatible) crashes on .gcda produced by libclang_rt.profile rL176173 realized the problem and introduced a mode to remove function names. llvm-cov code apparently takes GCDAProfiling.c output format as truth and tries to decode function names. Additionally, llvm-cov tries to decode tags in certain order which does not match libgcov emitted .gcda files. This patch fixes the .gcda decoder and makes it work with GCC 8 and 9 (10 is compatible with 9). Note, line statistics are broken and not fixed by this patch. Add test/tools/llvm-cov/gcov-{4.7,8,9}.c to test compatibility.	2020-05-10 09:55:23 -07:00
Simon Pilgrim	8cba050e0c	CoverageMapping.h - remove unused StringSet.h include. NFC.	2020-05-10 14:19:54 +01:00
Wei Mi	0ca1ed2836	Recommit [SampleFDO] Add flag for partial profile. Fix the error of show-prof-info.test on some platforms without zlib. The common profile usage is to collect profile from a target and then use the profile to guide the optimized build for the same target. There are some cases that no profile can be collected for a target. In those cases, although no full profile is available, it is possible to have some partial profile collected from other targets to optimize common libraries and utilities. A flag is needed to tell the partial profile from the full profile apart, so compiler can use different strategy for them. Differential Revision: https://reviews.llvm.org/D77426	2020-04-07 14:28:25 -07:00
Wei Mi	11ae325c37	Revert "[SampleFDO] Add flag for partial profile." show-prof-info.test breaks on some platforms. This reverts commit e3ba652a1440794eff0b43ce747f1b0488585d22.	2020-04-07 12:54:51 -07:00
Wei Mi	eb03930b3a	[SampleFDO] Add flag for partial profile. The common profile usage is to collect profile from a target and then use the profile to guide the optimized build for the same target. There are some cases that no profile can be collected for a target. In those cases, although no full profile is available, it is possible to have some partial profile collected from other targets to optimize common libraries and utilities. A flag is needed to tell the partial profile from the full profile apart, so compiler can use different strategy for them. Differential Revision: https://reviews.llvm.org/D77426	2020-04-07 12:17:56 -07:00
Guillaume Chatelet	d1b2c70fb7	Fix missing override	2020-03-31 07:41:36 +00:00
Wei Mi	a3742f4d36	[SampleFDO] Port MD5 name table support to extbinary format. Compbinary format uses MD5 to represent strings in name table. That gives smaller profile without the need of compression/decompression when writing/reading the profile. The patch adds the support in extbinary format. It is off by default but user can choose to enable it. Note the feature of using MD5 in name table can bring very small chance of name conflict leading to profile mismatch. Besides, profile using the feature won't have the profile remapping support. Differential Revision: https://reviews.llvm.org/D76255	2020-03-30 22:07:08 -07:00
Vedant Kumar	57ed845d56	[Coverage] Collect all function records in an object (D69471 followup) After the format change from D69471, there can be more than one section in an object that contains coverage function records. Look up each of these sections and concatenate all the records together. This re-enables the instrprof-merging.cpp test, which previously was failing on OSes which use comdats. Thanks to Jeremy Morse, who very kindly provided object files from the bot I broke to help me debug.	2020-03-02 12:01:09 -08:00
Vedant Kumar	616b414333	Add cast to appease clang-armv7-linux-build-cache (D69471 followup) http://lab.llvm.org:8011/builders/clang-armv7-linux-build-cache/builds/27075 error: non-constant-expression cannot be narrowed from type 'uint64_t' (aka 'unsigned long long') to 'size_t' (aka 'unsigned int') in initializer list [-Wc++11-narrowing] return {MappingBuf, getDataSize<FuncRecordTy, Endian>(Record)};	2020-02-28 18:27:06 -08:00
Vedant Kumar	1ce7fd2110	Reland: [Coverage] Revise format to reduce binary size Try again with an up-to-date version of D69471 (99317124 was a stale revision). --- Revise the coverage mapping format to reduce binary size by: 1. Naming function records and marking them `linkonce_odr`, and 2. Compressing filenames. This shrinks the size of llc's coverage segment by 82% (334MB -> 62MB) and speeds up end-to-end single-threaded report generation by 10%. For reference the compressed name data in llc is 81MB (__llvm_prf_names). Rationale for changes to the format: - With the current format, most coverage function records are discarded. E.g., more than 97% of the records in llc are duplicate placeholders for functions visible-but-not-used in TUs. Placeholders are used to show under-covered functions, but duplicate placeholders waste space. - We reached general consensus about giving (1) a try at the 2017 code coverage BoF [1]. The thinking was that using `linkonce_odr` to merge duplicates is simpler than alternatives like teaching build systems about a coverage-aware database/module/etc on the side. - Revising the format is expensive due to the backwards compatibility requirement, so we might as well compress filenames while we're at it. This shrinks the encoded filenames in llc by 86% (12MB -> 1.6MB). See CoverageMappingFormat.rst for the details on what exactly has changed. Fixes PR34533 [2], hopefully. [1] http://lists.llvm.org/pipermail/llvm-dev/2017-October/118428.html [2] https://bugs.llvm.org/show_bug.cgi?id=34533 Differential Revision: https://reviews.llvm.org/D69471	2020-02-28 18:12:04 -08:00
Vedant Kumar	52738a45b0	Revert "[Coverage] Revise format to reduce binary size" This reverts commit 99317124e1c772e9a9de41a0cd56e1db049b4ea4. This is still busted on Windows: http://lab.llvm.org:8011/builders/lld-x86_64-win7/builds/40873 The llvm-cov tests report 'error: Could not load coverage information'.	2020-02-28 18:03:15 -08:00
Vedant Kumar	ddbbf4cb94	[Coverage] Revise format to reduce binary size Revise the coverage mapping format to reduce binary size by: 1. Naming function records and marking them `linkonce_odr`, and 2. Compressing filenames. This shrinks the size of llc's coverage segment by 82% (334MB -> 62MB) and speeds up end-to-end single-threaded report generation by 10%. For reference the compressed name data in llc is 81MB (__llvm_prf_names). Rationale for changes to the format: - With the current format, most coverage function records are discarded. E.g., more than 97% of the records in llc are duplicate placeholders for functions visible-but-not-used in TUs. Placeholders are used to show under-covered functions, but duplicate placeholders waste space. - We reached general consensus about giving (1) a try at the 2017 code coverage BoF [1]. The thinking was that using `linkonce_odr` to merge duplicates is simpler than alternatives like teaching build systems about a coverage-aware database/module/etc on the side. - Revising the format is expensive due to the backwards compatibility requirement, so we might as well compress filenames while we're at it. This shrinks the encoded filenames in llc by 86% (12MB -> 1.6MB). See CoverageMappingFormat.rst for the details on what exactly has changed. Fixes PR34533 [2], hopefully. [1] http://lists.llvm.org/pipermail/llvm-dev/2017-October/118428.html [2] https://bugs.llvm.org/show_bug.cgi?id=34533 Differential Revision: https://reviews.llvm.org/D69471	2020-02-28 17:33:25 -08:00
Bill Wendling	0816222e8f	Revert "Remove redundant "std::move"s in return statements" The build failed with error: call to deleted constructor of 'llvm::Error' errors. This reverts commit 1c2241a7936bf85aa68aef94bd40c3ba77d8ddf2.	2020-02-10 07:07:40 -08:00
Bill Wendling	e45b5f33f3	Remove redundant "std::move"s in return statements	2020-02-10 06:39:44 -08:00
Benjamin Kramer	87d13166c7	Make llvm::StringRef to std::string conversions explicit. This is how it should've been and brings it more in line with std::string_view. There should be no functional change here. This is mostly mechanical from a custom clang-tidy check, with a lot of manual fixups. It uncovers a lot of minor inefficiencies. This doesn't actually modify StringRef yet, I'll do that in a follow-up.	2020-01-28 23:25:25 +01:00
Reid Kleckner	8067073c48	[Support] Split MallocAllocator out of Allocator.h StringMap.h is very popular (4K uses), and it doesn't need to see BumpPtrAllocator, which is relatively expensive according to ClangBuildAnalyzer. StringMap only needs MallocAllocator, so split that into AllocatorBase.h and use it instead. Here is the change in header uses: $ diff -u thedeps-before.txt thedeps-after.txt \| \ grep '^[-+] ' \| sort \| uniq -c \| sort -nr 3993 + ../llvm/include/llvm/Support/AllocatorBase.h 758 - ../llvm/include/llvm/Support/Allocator.h 270 - ../llvm/include/llvm/Support/Alignment.h 13 - ../llvm/include/llvm/Support/Host.h 6 - ../llvm/include/llvm/ADT/StringMap.h 4 - ../llvm/include/llvm/Support/SwapByteOrder.h 4 - ../llvm/include/llvm/Support/MathExtras.h 4 - ../llvm/include/llvm/Support/AlignOf.h 4 - ../llvm/include/llvm/ADT/SmallVector.h 1 - ../llvm/include/llvm/Support/PointerLikeTypeTraits.h Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D73392	2020-01-24 17:29:32 -08:00
Petr Hosek	c265774f7f	[profile] Support counter relocation at runtime This is an alternative to the continous mode that was implemented in D68351. This mode relies on padding and the ability to mmap a file over the existing mapping which is generally only available on POSIX systems and isn't suitable for other platforms. This change instead introduces the ability to relocate counters at runtime using a level of indirection. On every counter access, we add a bias to the counter address. This bias is stored in a symbol that's provided by the profile runtime and is initially set to zero, meaning no relocation. The runtime can mmap the profile into memory at abitrary location, and set bias to the offset between the original and the new counter location, at which point every subsequent counter access will be to the new location, which allows updating profile directly akin to the continous mode. The advantage of this implementation is that doesn't require any special OS support. The disadvantage is the extra overhead due to additional instructions required for each counter access (overhead both in terms of binary size and performance) plus duplication of counters (i.e. one copy in the binary itself and another copy that's mmapped). Differential Revision: https://reviews.llvm.org/D69740	2020-01-17 15:02:23 -08:00
Wei Mi	866992fbb3	[SampleFDO] Fix invalid branch profile generated by indirect call promotion. Suppose an inline instance has hot total sample count but 0 entry count, and it is an indirect call target. If the indirect call has no other call target and inline instance associated with it and it is promoted, currently the conditional branch generated by indirect call promotion will have invalid branch profile which is !{!"branch_weights", i32 0, i32 0} -- because the entry count of the promoted target is 0 and the total entry count of all targets is also 0. This caused a SEGV in Control Height Reduction and may cause problem in other passes. Function entry count of an inline instance is computed by a heuristic -- using either the sample of the starting line or starting inner inline instance. The patch changes the heuristic a little bit so that when total sample count is larger than 0, the computed entry count will be at least 1. Then the new branch profile will be !{!"branch_weights", i32 1, i32 0}. Differential Revision: https://reviews.llvm.org/D72790	2020-01-15 18:36:06 -08:00
Wenlei He	bb9a3aaf60	[AutoFDO] Properly merge context-sensitive profile of inlinee back to outlined function Summary: When sample profile loader decides not to inline a previously inlined call-site, we adjust the profile of outlined function simply by scaling up its profile counts by call-site count. This means the context-sensitive profile of that inlined instance will be thrown away. This commit try to keep context-sensitive profile for such cases: - Instead of scaling outlined function's profile, we now properly merge the FunctionSamples of inlined instance into outlined function, including all recursively inlined profile. - Instead of adjusting the profile for negative inline decision at the end of the sample profile loader pass, we do the profile merge right after processing each function. This change paired with top-down ordering of annotation/inline-replay (a separate diff) will make sure we recursively merge profile back before the profile is used for annotation and inline replay. A new switch -sample-profile-merge-inlinee is added to enable the new profile merge for tuning. It should be the default behavior eventually. Reviewers: wmi, davidxl Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70653	2019-12-05 15:57:55 -08:00
Vedant Kumar	d4e3cb7cbf	Revert "[Coverage] Revise format to reduce binary size" This reverts commit e18531595bba495946aa52c0a16b9f9238cff8bc. On Windows, there is an error: http://lab.llvm.org:8011/builders/sanitizer-windows/builds/54963/steps/stage%201%20check/logs/stdio error: C:\b\slave\sanitizer-windows\build\stage1\projects\compiler-rt\test\profile\Profile-x86_64\Output\instrprof-merging.cpp.tmp.v1.o: Failed to load coverage: Malformed coverage data	2019-12-04 10:35:14 -08:00
Vedant Kumar	bb7923fc7f	[Coverage] Revise format to reduce binary size Revise the coverage mapping format to reduce binary size by: 1. Naming function records and marking them `linkonce_odr`, and 2. Compressing filenames. This shrinks the size of llc's coverage segment by 82% (334MB -> 62MB) and speeds up end-to-end single-threaded report generation by 10%. For reference the compressed name data in llc is 81MB (__llvm_prf_names). Rationale for changes to the format: - With the current format, most coverage function records are discarded. E.g., more than 97% of the records in llc are duplicate placeholders for functions visible-but-not-used in TUs. Placeholders are used to show under-covered functions, but duplicate placeholders waste space. - We reached general consensus about giving (1) a try at the 2017 code coverage BoF [1]. The thinking was that using `linkonce_odr` to merge duplicates is simpler than alternatives like teaching build systems about a coverage-aware database/module/etc on the side. - Revising the format is expensive due to the backwards compatibility requirement, so we might as well compress filenames while we're at it. This shrinks the encoded filenames in llc by 86% (12MB -> 1.6MB). See CoverageMappingFormat.rst for the details on what exactly has changed. Fixes PR34533 [2], hopefully. [1] http://lists.llvm.org/pipermail/llvm-dev/2017-October/118428.html [2] https://bugs.llvm.org/show_bug.cgi?id=34533 Differential Revision: https://reviews.llvm.org/D69471	2019-12-04 10:10:55 -08:00
Simon Pilgrim	4bec5f5d38	SampleProfWriter - fix uninitialized variable warnings. NFCI.	2019-11-07 14:18:44 +00:00
Simon Pilgrim	0070b851b0	Fix uninitialized variable warning. NFCI.	2019-11-05 15:15:14 +00:00
Simon Pilgrim	642c3f5896	llvm.coverage.FunctionRecord - fix uninitialized variable warning. NFCI.	2019-11-02 18:03:22 +00:00
Vedant Kumar	c509534d2e	[profile] Add a mode to continuously sync counter updates to a file Add support for continuously syncing profile counter updates to a file. The motivation for this is that programs do not always exit cleanly. On iOS, for example, programs are usually killed via a signal from the OS. Running atexit() handlers after catching a signal is unreliable, so some method for progressively writing out profile data is necessary. The approach taken here is to mmap() the `__llvm_prf_cnts` section onto a raw profile. To do this, the linker must page-align the counter and data sections, and the runtime must ensure that counters are mapped to a page-aligned offset within a raw profile. Continuous mode is (for the moment) incompatible with the online merging mode. This limitation is lifted in https://reviews.llvm.org/D69586. Continuous mode is also (for the moment) incompatible with value profiling, as I'm not sure whether there is interest in this and the implementation may be tricky. As I have not been able to test extensively on non-Darwin platforms, only Darwin support is included for the moment. However, continuous mode may "just work" without modification on Linux and some UNIX-likes. AIUI the default value for the GNU linker's `--section-alignment` flag is set to the page size on many systems. This appears to be true for LLD as well, as its `no_nmagic` option is on by default. Continuous mode will not "just work" on Fuchsia or Windows, as it's not possible to mmap() a section on these platforms. There is a proposal to add a layer of indirection to the profile instrumentation to support these platforms. rdar://54210980 Differential Revision: https://reviews.llvm.org/D68351	2019-10-31 16:04:09 -07:00
Wei Mi	7cc7328f4b	[SampleFDO] Add profile remapping support for profile on-demand loading used by ExtBinary format profile Profile on-demand loading was added for ExtBinary format profile in rL374233, but currently profile on-demand loading doesn't work well with profile remapping. The patch adds the support. Suppose a function in the current module has outline instance in the profile. The function name in the module is different from the name of the outline instance, but remapper knows the two names are equal. When loading profile on-demand, the outline instance has to be loaded with remapper's help. At the same time SampleProfileReaderItaniumRemapper is changed from a proxy of SampleProfileReader to a helper member in SampleProfileReader. Differential Revision: https://reviews.llvm.org/D68901 llvm-svn: 375295	2019-10-18 22:35:20 +00:00
Wei Mi	cfcbdcb5f3	[SampleFDO] Add indexing for function profiles so they can be loaded on demand in ExtBinary format Currently for Text, Binary and ExtBinary format profiles, when we compile a module with samplefdo, even if there is no function showing up in the profile, we have to load all the function profiles from the profile input. That is a waste of compile time. CompactBinary format profile has already had the support of loading function profiles on demand. In this patch, we add the support to load profile on demand for ExtBinary format. It will work no matter the sections in ExtBinary format profile are compressed or not. Experiment shows it reduces the time to compile a server benchmark by 30%. When profile remapping and loading function profiles on demand are both used, extra work needs to be done so that the loading on demand process will take the name remapping into consideration. It will be addressed in a follow-up patch. Differential Revision: https://reviews.llvm.org/D68601 llvm-svn: 374233	2019-10-09 21:36:03 +00:00
Wei Mi	b8f1a4e11b	Fix build errors caused by rL373914. llvm-svn: 373919	2019-10-07 16:45:47 +00:00
Wei Mi	7850ded25a	[SampleFDO] Add compression support for any section in ExtBinary profile format Previously ExtBinary profile format only supports compression using zlib for profile symbol list. In this patch, we extend the compression support to any section. User can select some or all of the sections to compress. In an experiment, for a 45M profile in ExtBinary format, compressing name table reduced its size to 24M, and compressing all the sections reduced its size to 11M. Differential Revision: https://reviews.llvm.org/D68253 llvm-svn: 373914	2019-10-07 16:12:37 +00:00
Rong Xu	4f958a01ab	[PGO] Fix typos from r359612. NFC. llvm-svn: 373369	2019-10-01 18:06:50 +00:00
Rong Xu	1c42246e4b	[PGO] Don't group COMDAT variables for compiler generated profile variables in ELF With this patch, compiler generated profile variables will have its own COMDAT name for ELF format, which syncs the behavior with COFF. Tested with clang PGO bootstrap. This shows a modest reduction in object sizes in ELF format. Differential Revision: https://reviews.llvm.org/D68041 llvm-svn: 373241	2019-09-30 18:11:22 +00:00
Wei Mi	b265118981	Recommit [SampleFDO] Expose an interface to return the size of a section or the size of the profile for profile in ExtBinary format. Fix a test failure on Mac. [SampleFDO] Expose an interface to return the size of a section or the size of the profile for profile in ExtBinary format. Sometimes we want to limit the size of the profile by stripping some functions with low sample count or by stripping some function names with small text size from profile symbol list. That requires the profile reader to have the interfaces returning the size of a section or the size of total profile. The patch add those interfaces. At the same time, add some dump facility to show the size of each section. Differential revision: https://reviews.llvm.org/D67726 llvm-svn: 372478	2019-09-21 17:23:55 +00:00
Amara Emerson	ac22fb5afc	Revert "[SampleFDO] Expose an interface to return the size of a section or the size" This reverts commit f118852046a1d255ed8c65c6b5db320e8cea53a0. Broke the macOS build/greendragon bots. llvm-svn: 372464	2019-09-21 09:11:51 +00:00
Wei Mi	63f3fac875	[SampleFDO] Expose an interface to return the size of a section or the size of the profile for profile in ExtBinary format. Sometimes we want to limit the size of the profile by stripping some functions with low sample count or by stripping some function names with small text size from profile symbol list. That requires the profile reader to have the interfaces returning the size of a section or the size of total profile. The patch add those interfaces. At the same time, add some dump facility to show the size of each section. llvm-svn: 372439	2019-09-20 23:24:50 +00:00
Wei Mi	81f98562a0	[SampleFDO] Minimize performance impact when profile-sample-accurate is enabled. We can save memory and reduce binary size significantly by enabling ProfileSampleAccurate. However when ProfileSampleAccurate is true, function without sample will be regarded as cold and this could potentially cause performance regression. To minimize the potential negative performance impact, we want to be a little conservative here saying if a function shows up in the profile, no matter as outline instance, inline instance or call targets, treat the function as not being cold. This will handle the cases such as most callsites of a function are inlined in sampled binary (thus outline copy don't get any sample) but not inlined in current build (because of source code drift, imprecise debug information, or the callsites are all cold individually but not cold accumulatively...), so that the outline function showing up as cold in sampled binary will actually not be cold after current build. After the change, such function will be treated as not cold even profile-sample-accurate is enabled. At the same time we lower the hot criteria of callsiteIsHot check when profile-sample-accurate is enabled. callsiteIsHot is used to determined whether a callsite is hot and qualified for early inlining. When profile-sample-accurate is enabled, functions without profile will be regarded as cold and much less inlining will happen in CGSCC inlining pass, so we can worry less about size increase and be aggressive to allow more early inlining to happen for warm callsites and it is helpful for performance overall. Differential Revision: https://reviews.llvm.org/D67561 llvm-svn: 372232	2019-09-18 16:06:28 +00:00
Vedant Kumar	1ab21606e7	[Coverage] Speed up file-based queries for coverage info, NFC Speed up queries for coverage info in a file by reducing the amount of time spent determining whether a function record corresponds to a file. This gives a 36% speedup when generating a coverage report for `llc`. The reduction is entirely in user time. rdar://54758110 Differential Revision: https://reviews.llvm.org/D67575 llvm-svn: 372025	2019-09-16 19:08:44 +00:00
Vedant Kumar	53a68e5af8	[Coverage] Assert that filenames in a TU are unique, NFC llvm-svn: 372024	2019-09-16 19:08:41 +00:00
Vedant Kumar	9e3b309561	[InstrProf] Tighten a check for malformed data records in raw profiles The check needs to validate a counter offset before performing pointer arithmetic with the (potentially corrupt) offset. Found by UBSan's pointer overflow check. rdar://54843625 Differential Revision: https://reviews.llvm.org/D66979 llvm-svn: 370826	2019-09-03 22:23:14 +00:00
Wei Mi	f59cc38992	Fix some errors introduced by rL370563 which were not exposed on my local machine. 1. zlib::compress accept &size_t but the param is an uint64_t. 2. Some systems don't have zlib installed. Don't use compression by default. llvm-svn: 370564	2019-08-31 03:17:49 +00:00
Wei Mi	47e2f8a30e	[SampleFDO] Add profile symbol list section to discriminate function being cold versus function being newly added. This is the second half of https://reviews.llvm.org/D66374. Profile symbol list is the collection of function symbols showing up in the binary which generates the current profile. It is used to discriminate function being cold versus function being newly added. Profile symbol list is only added for profile with ExtBinary format. During profile use compilation, when profile-sample-accurate is enabled, a function without profile will be regarded as cold only when it is contained in that list. Differential Revision: https://reviews.llvm.org/D66766 llvm-svn: 370563	2019-08-31 02:27:26 +00:00
James Y Knight	3c9c5131da	Ignore object files that lack coverage information. Before this change, if multiple binary files were presented, all of them must have been instrumented or the load would fail with coverage_map_error::no_data_found. Patch by Dean Sturtevant. Differential Revision: https://reviews.llvm.org/D66763 llvm-svn: 370257	2019-08-28 20:35:50 +00:00
Wei Mi	ad26f2d41b	[SampleFDO] Extract the code calling each section reader to readOneSection. This is a followup of https://reviews.llvm.org/D66513. The code calling each section reader should be put into a separate function (readOneSection), so SampleProfileExtBinaryReader can override it. Otherwise, the base class SampleProfileExtBinaryBaseReader will need to be aware of all different kinds of section readers. That is not right. Differential Revision: https://reviews.llvm.org/D66693 llvm-svn: 369919	2019-08-26 15:54:16 +00:00
Wei Mi	076a942846	Fix some warnings introduced by r369798. llvm-svn: 369799	2019-08-23 19:39:12 +00:00
Wei Mi	77474d769d	[SampleFDO] Add ExtBinary format to support extension of binary profile. This is a patch split from https://reviews.llvm.org/D66374. It tries to add a new format of profile called ExtBinary. The format adds a section header table to the profile and organize the profile in sections, so the future extension like adding a new section or extending an existing section will be easier while keeping backward compatiblity feasible. Differential Revision: https://reviews.llvm.org/D66513 llvm-svn: 369798	2019-08-23 19:05:30 +00:00
Benjamin Kramer	e6153d53d4	Use C++14 heteregenous lookup for a couple of std::map<std::string, ...> These call find with a StringRef, heterogenous lookup saves a temporary std::string there. llvm-svn: 369581	2019-08-21 21:17:34 +00:00
Raphael Isemann	e42702d682	[NFC] Mark CallTargetComparator() as const to fix libc++ warnings We currently get this warning when compiling with libc++: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/set:454:26: warning: the specified comparator type does not provide a const call operator [-Wuser-defined-warnings] static_assert(sizeof(__diagnose_non_const_comparator<_Key, _Compare>()), ""); ^ llvm-project/llvm/include/llvm/ProfileData/SampleProf.h:193:29: note: in instantiation of template class 'std::__1::set<std::__1::pair<llvm::StringRef, unsigned long long>, llvm::sampleprof::SampleRecord::CallTargetComparator, std::__1::allocator<std::__1::pair<llvm::StringRef, unsigned long long> > >' requested here const SortedCallTargetSet getSortedCallTargets() const { ^ /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/__tree:967:5: note: from 'diagnose_if' attribute on '__diagnose_non_const_comparator<std::__1::pair<llvm::StringRef, unsigned long long>, llvm::sampleprof::SampleRecord::CallTargetComparator>': _LIBCPP_DIAGNOSE_WARNING(!std::__invokable<_Compare const&, _Tp const&, _Tp const&>::value, ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/__config:1320:21: note: expanded from macro '_LIBCPP_DIAGNOSE_WARNING' __attribute__((diagnose_if(__VA_ARGS__, "warning"))) ^ ~~~~~~~~~~~ 1 warning generated. llvm-svn: 369500	2019-08-21 07:39:17 +00:00
Wenlei He	733b0981cf	[AutoFDO] Make call targets order deterministic for sample profile Summary: StringMap is used for storing call target to frequency map for AutoFDO. However the iterating order of StringMap is non-deterministic, which leads to non-determinism in AutoFDO profile output. Now new API getSortedCallTargets and SortCallTargets are added for deterministic ordering and output. Roundtrip test for text profile and binary profile is added. Reviewers: wmi, davidxl, danielcdh Subscribers: hiraditya, mgrang, llvm-commits, twoh Tags: #llvm Differential Revision: https://reviews.llvm.org/D66191 llvm-svn: 369440	2019-08-20 20:52:00 +00:00
Jonas Devlieghere	2c693415b7	[llvm] Migrate llvm::make_unique to std::make_unique Now that we've moved to C++14, we no longer need the llvm::make_unique implementation from STLExtras.h. This patch is a mechanical replacement of (hopefully) all the llvm::make_unique instances across the monorepo. llvm-svn: 369013	2019-08-15 15:54:37 +00:00
Wenlei He	2b7cf1e1b8	[llvm-profdata] Profile dump for compact binary format Summary: Fix "llvm-profdata show" so it can work with compact binary format profile. The change is to mark all functions "used" so SampleProfileReaderCompactBinary::read will read in all profiles available for dumping. The function names will be MD5 hash for compact binary format. Reviewers: wmi, davidxl, danielcdh Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65162 llvm-svn: 368731	2019-08-13 17:56:08 +00:00
Wenlei He	49b058fb15	[ThinLTO][AutoFDO] Fix memory corruption due to race condition from thin backends Summary: This commit fixed a race condition from multi-threaded thinLTO backends that causes non-deterministic memory corruption for a data structure used only by AutoFDO with compact binary profile. GUIDToFuncNameMap, a static data member of type DenseMap in FunctionSamples is used as a per-module mapping from function name MD5 to name string when input AutoFDO profile is in compact binary format. However with ThinLTO, we can have parallel backends modifying and accessing the class static map concurrently. The fix is to make GUIDToFuncNameMap a member of SampleProfileLoader instead of a file static data. Reviewers: wmi, davidxl, danielcdh Subscribers: mehdi_amini, inglorion, hiraditya, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65848 llvm-svn: 368596	2019-08-12 17:45:14 +00:00
Vedant Kumar	9f5b410e3f	[Coverage] Load code coverage data from archives Support loading code coverage data from regular archives, thin archives, and from MachO universal binaries which contain archives. Testing: check-llvm, check-profile (with {A,UB}San enabled) rdar://51538999 Differential Revision: https://reviews.llvm.org/D63232 llvm-svn: 363325	2019-06-13 20:48:57 +00:00
Rong Xu	d938c3cfb7	[llvm-profdata] Add overlap command to compute similarity b/w two profile files Add overlap functionality to llvm-profdata tool to compute the similarity between two profile files. Differential Revision: https://reviews.llvm.org/D60977 llvm-svn: 359612	2019-04-30 21:19:12 +00:00

1 2 3 4 5 ...

591 Commits