llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-25 20:23:11 +01:00

Author	SHA1	Message	Date
wlei	593cc2c826	[NFC] Fix build break by a initializer list converting error	2021-01-13 14:28:02 -08:00
wlei	82ce5502f2	[NFC] fix missing SectionName declaration	2021-01-13 11:30:09 -08:00
wlei	37f75ec7df	[CSSPGO][llvm-profgen] Virtual unwinding with pseudo probe This change extends virtual unwinder to support pseudo probe in llvm-profgen. Please refer https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s and https://reviews.llvm.org/D89707 for more context about CSSPGO and llvm-profgen. Implementation - Added `ProbeBasedCtxKey` derived from `ContextKey` for sample counter aggregation. As we need string splitting to infer the profile for callee function, string based context introduces more string handling overhead, here we just use probe pointer based context. - For linear unwinding, as inline context is encoded in each pseudo probe, we don't need to go through each instruction to extract range sharing same inliner. So just record the range for the context. - For probe based context, we should ignore the top frame probe since it will be extracted from the address range. we defer the extraction in `ProfileGeneration`. - Added `PseudoProbeProfileGenerator` for pseudo probe based profile generation. - Some helper function to get pseduo probe info(call probe, inline context) from profiled binary. - Added regression test for unwinder's output The pseudo probe based profile generation will be in the upcoming patch. Test Plan: ninja & ninja check-llvm Differential Revision: https://reviews.llvm.org/D92896	2021-01-13 11:02:58 -08:00
wlei	35a868aba5	[CSSPGO][llvm-profgen] Refactor to unify hashable interface for trace sample and context-sensitive counter As we plan to support both CSSPGO and AutoFDO for llvm-profgen, we will have different kinds of perf sample and different kinds of sample counter(cs/non-cs, with/without pseudo probe) which both need to do aggregation in hash map. This change implements the hashable interface(`Hashable`) and the unified base class for them to have better extensibility and reusability. Currently perf trace sample and sample counter with context implemented this `Hashable` and the class hierarchy is like: ``` \| Hashable \| PerfSample \| HybridSample \| LBRSample \| ContextKey \| StringBasedCtxKey \| ProbeBasedCtxKey \| CallsiteBasedCtxKey \| ... ``` - Class specifying `Hashable` should implement `getHashCode` and `isEqual`. Here we make `getHashCode` a non-virtual function to avoid vtable overhead, so derived class should calculate and assign the base class's HashCode manually. This also provides the flexibility for calculating the hash code incrementally(like rolling hash) during frame stack unwinding - `isEqual` is a virtual function, which will have perf overhead. In the future, if we redesign a better hash function, then we can just skip this or switch to non-virtual function. - Added `PerfSample` and `ContextKey` as base class for perf sample and counter context key, leveraging llvm-style RTTI for this. - Added `StringBasedCtxKey` class extending `ContextKey` to use string as context id. - Refactor `AggregationCounter` to take all kinds of `PerfSample` as key - Refactor `ContextSampleCounter` to take all kinds of `ContextKey` as key - Other refactoring work: - Create a wrapper class `SampleCounter` to wrap `RangeCounter` and `BranchCounter` - Hoist `ContextId` and `FunctionProfile` out of `populateFunctionBodySamples` and `populateFunctionBoundarySamples` to reuse them in ProfileGenerator Differential Revision: https://reviews.llvm.org/D92584	2021-01-13 11:02:57 -08:00
wlei	b32f59cb87	[CSSPGO][llvm-profgen] Pseudo probe decoding and disassembling This change implements pseudo probe decoding and disassembling for llvm-profgen/CSSPGO. Please see https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s and https://reviews.llvm.org/D89707 for more context about CSSPGO and llvm-profgen. ELF section format Please see the encoding patch(https://reviews.llvm.org/D91878) for more details of the format, just copy the example here: Two section(`.pseudo_probe_desc` and `.pseudoprobe` ) is emitted in ELF to support pseudo probe. The format of `.pseudo_probe_desc` section looks like: ``` .section .pseudo_probe_desc,"",@progbits .quad 6309742469962978389 // Func GUID .quad 4294967295 // Func Hash .byte 9 // Length of func name .ascii "_Z5funcAi" // Func name .quad 7102633082150537521 .quad 138828622701 .byte 12 .ascii "_Z8funcLeafi" .quad 446061515086924981 .quad 4294967295 .byte 9 .ascii "_Z5funcBi" .quad -2016976694713209516 .quad 72617220756 .byte 7 .ascii "_Z3fibi" ``` For each `.pseudoprobe` section, the encoded binary data consists of a single function record corresponding to an outlined function (i.e, a function with a code entry in the `.text` section). A function record has the following format : ``` FUNCTION BODY (one for each outlined function present in the text section) GUID (uint64) GUID of the function NPROBES (ULEB128) Number of probes originating from this function. NUM_INLINED_FUNCTIONS (ULEB128) Number of callees inlined into this function, aka number of first-level inlinees PROBE RECORDS A list of NPROBES entries. Each entry contains: INDEX (ULEB128) TYPE (uint4) 0 - block probe, 1 - indirect call, 2 - direct call ATTRIBUTE (uint3) reserved ADDRESS_TYPE (uint1) 0 - code address, 1 - address delta CODE_ADDRESS (uint64 or ULEB128) code address or address delta, depending on ADDRESS_TYPE INLINED FUNCTION RECORDS A list of NUM_INLINED_FUNCTIONS entries describing each of the inlined callees. Each record contains: INLINE SITE GUID of the inlinee (uint64) ID of the callsite probe (ULEB128) FUNCTION BODY A FUNCTION BODY entry describing the inlined function. ``` Disassembling A switch `--show-pseudo-probe` is added to use along with `--show-disassembly` to print disassembly code with pseudo probe directives. For example: ``` 00000000002011a0 <foo2>: 2011a0: 50 push rax 2011a1: 85 ff test edi,edi [Probe]: FUNC: foo2 Index: 1 Type: Block 2011a3: 74 02 je 2011a7 <foo2+0x7> [Probe]: FUNC: foo2 Index: 3 Type: Block [Probe]: FUNC: foo2 Index: 4 Type: Block [Probe]: FUNC: foo Index: 1 Type: Block Inlined: @ foo2:6 2011a5: 58 pop rax 2011a6: c3 ret [Probe]: FUNC: foo2 Index: 2 Type: Block 2011a7: bf 01 00 00 00 mov edi,0x1 [Probe]: FUNC: foo2 Index: 5 Type: IndirectCall 2011ac: ff d6 call rsi [Probe]: FUNC: foo2 Index: 4 Type: Block 2011ae: 58 pop rax 2011af: c3 ret ``` Implementation - `PseudoProbeDecoder` is added in ProfiledBinary as an infra for the decoding. It decoded the two section and generate two map: `GUIDProbeFunctionMap` stores all the `PseudoProbeFunction` which is the abstraction of a general function. `AddressProbesMap` stores all the pseudo probe info indexed by its address. - All the inline info is encoded into binary as a trie(`PseudoProbeInlineTree`) and will be constructed from the decoding. Each pseudo probe can get its inline context(`getInlineContext`) by traversing its inline tree node backwards. Test Plan: ninja & ninja check-llvm Differential Revision: https://reviews.llvm.org/D92334	2021-01-13 11:02:57 -08:00
Kazu Hirata	cac304a74c	[llvm] Use llvm::lower_bound and llvm::upper_bound (NFC)	2021-01-05 21:15:59 -08:00
Tom Stellard	f1a54f61f5	llvm-profgen: Parse command line arguments after initializing targets I am experimenting with turning backends into loadable modules and in that scenario, target specific command line arguments won't be available until after the targets are initialized. Also, most other tools initialize targets before parsing arguments. Reviewed By: wlei Differential Revision: https://reviews.llvm.org/D93348	2020-12-21 15:13:10 -08:00
wlei	189f7cb2ab	[llvm-profgen][NFC] Fix test failure by making unwinder's output deterministic Don't know why under Sanitizer build(asan/msan/ubsan), the `std::unordered_map<string, ...>`'s output order is reversed, make the regression test failed. This change creates a workaround by using sorted container to make the output deterministic. Reviewed By: hoy, wenlei Differential Revision: https://reviews.llvm.org/D92816	2020-12-07 22:36:25 -08:00
wlei	db7fa377e4	[CSSPGO][llvm-profgen] Context-sensitive profile data generation This stack of changes introduces `llvm-profgen` utility which generates a profile data file from given perf script data files for sample-based PGO. It’s part of(not only) the CSSPGO work. Specifically to support context-sensitive with/without pseudo probe profile, it implements a series of functionalities including perf trace parsing, instruction symbolization, LBR stack/call frame stack unwinding, pseudo probe decoding, etc. Also high throughput is achieved by multiple levels of sample aggregation and compatible format with one stop is generated at the end. Please refer to: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s for the CSSPGO RFC. This change supports context-sensitive profile data generation into llvm-profgen. With simultaneous sampling for LBR and call stack, we can identify leaf of LBR sample with calling context from stack sample . During the process of deriving fall through path from LBR entries, we unwind LBR by replaying all the calls and returns (including implicit calls/returns due to inlining) backwards on top of the sampled call stack. Then the state of call stack as we unwind through LBR always represents the calling context of current fall through path. we have two types of virtual unwinding 1) LBR unwinding and 2) linear range unwinding. Specifically, for each LBR entry which can be classified into call, return, regular branch, LBR unwinding will replay the operation by pushing, popping or switching leaf frame towards the call stack and since the initial call stack is most recently sampled, the replay should be in anti-execution order, i.e. for the regular case, pop the call stack when LBR is call, push frame on call stack when LBR is return. After each LBR processed, it also needs to align with the next LBR by going through instructions from previous LBR's target to current LBR's source, which we named linear unwinding. As instruction from linear range can come from different function by inlining, linear unwinding will do the range splitting and record counters through the range with same inline context. With each fall through path from LBR unwinding, we aggregate each sample into counters by the calling context and eventually generate full context sensitive profile (without relying on inlining) to driver compiler's PGO/FDO. A breakdown of noteworthy changes: - Added `HybridSample` class as the abstraction perf sample including LBR stack and call stack * Extended `PerfReader` to implement auto-detect whether input perf script output contains CS profile, then do the parsing. Multiple `HybridSample` are extracted * Speed up by aggregating `HybridSample` into `AggregatedSamples` * Added VirtualUnwinder that consumes aggregated `HybridSample` and implements unwinding of calls, returns, and linear path that contains implicit call/return from inlining. Ranges and branches counters are aggregated by the calling context.  Here calling context is string type, each context is a pair of function name and callsite location info, the whole context is like `main:1 @ foo:2 @ bar`. * Added PorfileGenerater that accumulates counters by ranges unfolding or branch target mapping, then generates context-sensitive function profile including function body, inferring callee's head sample, callsite target samples, eventually records into ProfileMap.  * Leveraged LLVM build-in(`SampleProfWriter`) writer to support different serialization format with no stop - `getCanonicalFnName` for callee name and name from ELF section - Added regression test for both unwinding and profile generation Test Plan: ninja & ninja check-llvm Reviewed By: hoy, wenlei, wmi Differential Revision: https://reviews.llvm.org/D89723	2020-12-07 13:48:58 -08:00
Chris Sears	ee3732c1ab	[llvmbuildectomy] removed vestigial LLVMBuild.txt files LLVMBuild has been removed from the build system. However, three LLVMBuild.txt files remain in the tree. This patch simply removes them. llvm/lib/ExecutionEngine/Orc/TargetProcess/LLVMBuild.txt llvm/tools/llvm-jitlink/llvm-jitlink-executor/LLVMBuild.txt llvm/tools/llvm-profgen/LLVMBuild.txt Differential Revision: https://reviews.llvm.org/D92693	2020-12-05 22:00:22 +01:00
Georgii Rymar	65bd36f2bc	[llvm-profgen] - Fix compilation issue after ELFFile<ELFT> interface update. `D92560` changed `ELFObjectFile::getELFFile` to return reference.	2020-12-04 16:09:25 +03:00
Michael Liao	ec57414791	Fix shared build.	2020-11-21 17:07:42 -05:00
wlei	dd799fc98b	[llvm-profgen][NFC]Fix build failure on different platform see titile Test Plan: ninja & ninja check-llvm Reviewed By: hoy Differential Revision: https://reviews.llvm.org/D91897	2020-11-20 16:36:04 -08:00
wlei	06a316e7a3	[CSSPGO][llvm-profgen] Instruction symbolization This stack of changes introduces `llvm-profgen` utility which generates a profile data file from given perf script data files for sample-based PGO. It’s part of(not only) the CSSPGO work. Specifically to support context-sensitive with/without pseudo probe profile, it implements a series of functionalities including perf trace parsing, instruction symbolization, LBR stack/call frame stack unwinding, pseudo probe decoding, etc. Also high throughput is achieved by multiple levels of sample aggregation and compatible format with one stop is generated at the end. Please refer to: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s for the CSSPGO RFC. This change adds the support of instruction symbolization. Given the RVA on an instruction pointer, a full calling context can be printed side-by-side with the disassembly code. E.g. ``` Disassembly of section .text [0x0, 0x4a]: <funcA>: 0: mov eax, edi funcA:0 2: mov ecx, dword ptr [rip] funcLeaf:2 @ funcA:1 8: lea edx, [rcx + 3] fib:2 @ funcLeaf:2 @ funcA:1 b: cmp ecx, 3 fib:2 @ funcLeaf:2 @ funcA:1 e: cmovl edx, ecx fib:2 @ funcLeaf:2 @ funcA:1 11: sub eax, edx funcLeaf:2 @ funcA:1 13: ret funcA:2 14: nop word ptr cs:[rax + rax] 1e: nop <funcLeaf>: 20: mov eax, edi funcLeaf:1 22: mov ecx, dword ptr [rip] funcLeaf:2 28: lea edx, [rcx + 3] fib:2 @ funcLeaf:2 2b: cmp ecx, 3 fib:2 @ funcLeaf:2 2e: cmovl edx, ecx fib:2 @ funcLeaf:2 31: sub eax, edx funcLeaf:2 33: ret funcLeaf:3 34: nop word ptr cs:[rax + rax] 3e: nop <fib>: 40: lea eax, [rdi + 3] fib:2 43: cmp edi, 3 fib:2 46: cmovl eax, edi fib:2 49: ret fib:8 ``` Test Plan: ninja check-llvm Reviewed By: wenlei, wmi Differential Revision: https://reviews.llvm.org/D89715	2020-11-20 14:26:27 -08:00
wlei	563627eb7b	[CSSPGO][llvm-profgen] Disassemble text sections This stack of changes introduces `llvm-profgen` utility which generates a profile data file from given perf script data files for sample-based PGO. It’s part of(not only) the CSSPGO work. Specifically to support context-sensitive with/without pseudo probe profile, it implements a series of functionalities including perf trace parsing, instruction symbolization, LBR stack/call frame stack unwinding, pseudo probe decoding, etc. Also high throughput is achieved by multiple levels of sample aggregation and compatible format with one stop is generated at the end. Please refer to: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s for the CSSPGO RFC. This change enables disassembling the text sections to build various address maps that are potentially used by the virtual unwinder. A switch `--show-disassembly` is being added to print the disassembly code. Like the llvm-objdump tool, this change leverages existing LLVM components to parse and disassemble ELF binary files. So far X86 is supported. Test Plan: ninja check-llvm Reviewed By: wmi, wenlei Differential Revision: https://reviews.llvm.org/D89712	2020-11-20 14:26:26 -08:00
wlei	cc95d46ee1	[CSSPGO][llvm-profgen] Parse mmap events from perf script This stack of changes introduces `llvm-profgen` utility which generates a profile data file from given perf script data files for sample-based PGO. It’s part of(not only) the CSSPGO work. Specifically to support context-sensitive with/without pseudo probe profile, it implements a series of functionalities including perf trace parsing, instruction symbolization, LBR stack/call frame stack unwinding, pseudo probe decoding, etc. Also high throughput is achieved by multiple levels of sample aggregation and compatible format with one stop is generated at the end. Please refer to: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s for the CSSPGO RFC. As a starter, this change sets up an entry point by introducing PerfReader to load profiled binaries and perf traces(including perf events and perf samples). For the event, here it parses the mmap2 events from perf script to build the loader snaps, which is used to retrieve the image load address in the subsequent perf tracing parsing. As described in llvm-profgen.rst, the tool being built aims to support multiple input perf data (preprocessed by perf script) as well as multiple input binary images. It should also support dynamic reload/unload shared objects by leveraging the loader snaps being built by this change Reviewed By: wenlei, wmi Differential Revision: https://reviews.llvm.org/D89707	2020-11-20 14:26:26 -08:00

16 Commits