1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-02-01 13:11:39 +01:00
llvm-mirror/tools/llvm-profgen/ProfiledBinary.h
wlei db7fa377e4 [CSSPGO][llvm-profgen] Context-sensitive profile data generation
This stack of changes introduces `llvm-profgen` utility which generates a profile data file from given perf script data files for sample-based PGO. It’s part of(not only) the CSSPGO work. Specifically to support context-sensitive with/without pseudo probe profile, it implements a series of functionalities including perf trace parsing, instruction symbolization, LBR stack/call frame stack unwinding, pseudo probe decoding, etc. Also high throughput is achieved by multiple levels of sample aggregation and compatible format with one stop is generated at the end. Please refer to: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s for the CSSPGO RFC.

This change supports context-sensitive profile data generation into llvm-profgen. With simultaneous sampling for LBR and call stack, we can identify leaf of LBR sample with calling context from stack sample . During the process of deriving fall through path from LBR entries, we unwind LBR by replaying all the calls and returns (including implicit calls/returns due to inlining) backwards on top of the sampled call stack. Then the state of call stack as we unwind through LBR always represents the calling context of current fall through path.

we have two types of virtual unwinding 1) LBR unwinding and 2) linear range unwinding.
Specifically, for each LBR entry which can be classified into call, return, regular branch, LBR unwinding will replay the operation by pushing, popping or switching leaf frame towards the call stack and since the initial call stack is most recently sampled, the replay should be in anti-execution order, i.e. for the regular case, pop the call stack when LBR is call, push frame on call stack when LBR is return. After each LBR processed, it also needs to align with the next LBR by going through instructions from previous LBR's target to current LBR's source, which we named linear unwinding. As instruction from linear range can come from different function by inlining, linear unwinding will do the range splitting and record counters through the range with same inline context.

With each fall through path from LBR unwinding, we aggregate each sample into counters by the calling context and eventually generate full context sensitive profile (without relying on inlining) to driver compiler's PGO/FDO.

A breakdown of noteworthy changes:
- Added `HybridSample` class as the abstraction perf sample including LBR stack and call stack
* Extended `PerfReader` to implement auto-detect whether input perf script output contains CS profile, then do the parsing. Multiple `HybridSample` are extracted
* Speed up by aggregating  `HybridSample` into `AggregatedSamples`
* Added VirtualUnwinder that consumes aggregated  `HybridSample` and implements unwinding of calls, returns, and linear path that contains implicit call/return from inlining. Ranges and branches counters are aggregated by the calling context.
 Here calling context is string type, each context is a pair of function name and callsite location info, the whole context is like `main:1 @ foo:2 @ bar`.
* Added PorfileGenerater that accumulates counters by ranges unfolding or branch target mapping, then generates context-sensitive function profile including function body, inferring callee's head sample, callsite target samples, eventually records into ProfileMap.

* Leveraged LLVM build-in(`SampleProfWriter`) writer to support different serialization format with no stop
- `getCanonicalFnName` for callee name and name from ELF section
- Added regression test for both unwinding and profile generation

Test Plan:
ninja & ninja check-llvm

Reviewed By: hoy, wenlei, wmi

Differential Revision: https://reviews.llvm.org/D89723
2020-12-07 13:48:58 -08:00

236 lines
8.6 KiB
C++

//===-- ProfiledBinary.h - Binary decoder -----------------------*- C++ -*-===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
#ifndef LLVM_TOOLS_LLVM_PROFGEN_PROFILEDBINARY_H
#define LLVM_TOOLS_LLVM_PROFGEN_PROFILEDBINARY_H
#include "CallContext.h"
#include "llvm/ADT/StringRef.h"
#include "llvm/DebugInfo/Symbolize/Symbolize.h"
#include "llvm/MC/MCAsmInfo.h"
#include "llvm/MC/MCContext.h"
#include "llvm/MC/MCDisassembler/MCDisassembler.h"
#include "llvm/MC/MCInst.h"
#include "llvm/MC/MCInstPrinter.h"
#include "llvm/MC/MCInstrAnalysis.h"
#include "llvm/MC/MCInstrInfo.h"
#include "llvm/MC/MCObjectFileInfo.h"
#include "llvm/MC/MCRegisterInfo.h"
#include "llvm/MC/MCSubtargetInfo.h"
#include "llvm/MC/MCTargetOptions.h"
#include "llvm/Object/ELFObjectFile.h"
#include "llvm/ProfileData/SampleProf.h"
#include "llvm/Support/Path.h"
#include <list>
#include <set>
#include <sstream>
#include <string>
#include <unordered_map>
#include <unordered_set>
#include <vector>
using namespace llvm;
using namespace sampleprof;
using namespace llvm::object;
namespace llvm {
namespace sampleprof {
class ProfiledBinary;
struct InstructionPointer {
ProfiledBinary *Binary;
union {
// Offset of the executable segment of the binary.
uint64_t Offset = 0;
// Also used as address in unwinder
uint64_t Address;
};
// Index to the sorted code address array of the binary.
uint64_t Index = 0;
InstructionPointer(ProfiledBinary *Binary, uint64_t Address,
bool RoundToNext = false);
void advance();
void backward();
void update(uint64_t Addr);
};
// PrologEpilog offset tracker, used to filter out broken stack samples
// Currently we use a heuristic size (two) to infer prolog and epilog
// based on the start address and return address. In the future,
// we will switch to Dwarf CFI based tracker
struct PrologEpilogTracker {
// A set of prolog and epilog offsets. Used by virtual unwinding.
std::unordered_set<uint64_t> PrologEpilogSet;
ProfiledBinary *Binary;
PrologEpilogTracker(ProfiledBinary *Bin) : Binary(Bin){};
// Take the two addresses from the start of function as prolog
void inferPrologOffsets(
std::unordered_map<uint64_t, std::string> &FuncStartAddrMap) {
for (auto I : FuncStartAddrMap) {
PrologEpilogSet.insert(I.first);
InstructionPointer IP(Binary, I.first);
IP.advance();
PrologEpilogSet.insert(IP.Offset);
}
}
// Take the last two addresses before the return address as epilog
void inferEpilogOffsets(std::unordered_set<uint64_t> &RetAddrs) {
for (auto Addr : RetAddrs) {
PrologEpilogSet.insert(Addr);
InstructionPointer IP(Binary, Addr);
IP.backward();
PrologEpilogSet.insert(IP.Offset);
}
}
};
class ProfiledBinary {
// Absolute path of the binary.
std::string Path;
// The target triple.
Triple TheTriple;
// The runtime base address that the executable sections are loaded at.
mutable uint64_t BaseAddress = 0;
// The preferred base address that the executable sections are loaded at.
uint64_t PreferredBaseAddress = 0;
// Mutiple MC component info
std::unique_ptr<const MCRegisterInfo> MRI;
std::unique_ptr<const MCAsmInfo> AsmInfo;
std::unique_ptr<const MCSubtargetInfo> STI;
std::unique_ptr<const MCInstrInfo> MII;
std::unique_ptr<MCDisassembler> DisAsm;
std::unique_ptr<const MCInstrAnalysis> MIA;
std::unique_ptr<MCInstPrinter> IPrinter;
// A list of text sections sorted by start RVA and size. Used to check
// if a given RVA is a valid code address.
std::set<std::pair<uint64_t, uint64_t>> TextSections;
// Function offset to name mapping.
std::unordered_map<uint64_t, std::string> FuncStartAddrMap;
// Offset to context location map. Used to expand the context.
std::unordered_map<uint64_t, FrameLocationStack> Offset2LocStackMap;
// An array of offsets of all instructions sorted in increasing order. The
// sorting is needed to fast advance to the next forward/backward instruction.
std::vector<uint64_t> CodeAddrs;
// A set of call instruction offsets. Used by virtual unwinding.
std::unordered_set<uint64_t> CallAddrs;
// A set of return instruction offsets. Used by virtual unwinding.
std::unordered_set<uint64_t> RetAddrs;
PrologEpilogTracker ProEpilogTracker;
// The symbolizer used to get inline context for an instruction.
std::unique_ptr<symbolize::LLVMSymbolizer> Symbolizer;
void setPreferredBaseAddress(const ELFObjectFileBase *O);
// Set up disassembler and related components.
void setUpDisassembler(const ELFObjectFileBase *Obj);
void setupSymbolizer();
/// Dissassemble the text section and build various address maps.
void disassemble(const ELFObjectFileBase *O);
/// Helper function to dissassemble the symbol and extract info for unwinding
bool dissassembleSymbol(std::size_t SI, ArrayRef<uint8_t> Bytes,
SectionSymbolsTy &Symbols, const SectionRef &Section);
/// Symbolize a given instruction pointer and return a full call context.
FrameLocationStack symbolize(const InstructionPointer &IP,
bool UseCanonicalFnName = false);
/// Decode the interesting parts of the binary and build internal data
/// structures. On high level, the parts of interest are:
/// 1. Text sections, including the main code section and the PLT
/// entries that will be used to handle cross-module call transitions.
/// 2. The .debug_line section, used by Dwarf-based profile generation.
/// 3. Pseudo probe related sections, used by probe-based profile
/// generation.
void load();
const FrameLocationStack &getFrameLocationStack(uint64_t Offset) const {
auto I = Offset2LocStackMap.find(Offset);
assert(I != Offset2LocStackMap.end() &&
"Can't find location for offset in the binary");
return I->second;
}
public:
ProfiledBinary(StringRef Path) : Path(Path), ProEpilogTracker(this) {
setupSymbolizer();
load();
}
uint64_t virtualAddrToOffset(uint64_t VitualAddress) const {
return VitualAddress - BaseAddress;
}
uint64_t offsetToVirtualAddr(uint64_t Offset) const {
return Offset + BaseAddress;
}
const StringRef getPath() const { return Path; }
const StringRef getName() const { return llvm::sys::path::filename(Path); }
uint64_t getBaseAddress() const { return BaseAddress; }
void setBaseAddress(uint64_t Address) { BaseAddress = Address; }
uint64_t getPreferredBaseAddress() const { return PreferredBaseAddress; }
bool addressIsCode(uint64_t Address) const {
uint64_t Offset = virtualAddrToOffset(Address);
return Offset2LocStackMap.find(Offset) != Offset2LocStackMap.end();
}
bool addressIsCall(uint64_t Address) const {
uint64_t Offset = virtualAddrToOffset(Address);
return CallAddrs.count(Offset);
}
bool addressIsReturn(uint64_t Address) const {
uint64_t Offset = virtualAddrToOffset(Address);
return RetAddrs.count(Offset);
}
bool addressInPrologEpilog(uint64_t Address) const {
uint64_t Offset = virtualAddrToOffset(Address);
return ProEpilogTracker.PrologEpilogSet.count(Offset);
}
uint64_t getAddressforIndex(uint64_t Index) const {
return offsetToVirtualAddr(CodeAddrs[Index]);
}
// Get the index in CodeAddrs for the address
// As we might get an address which is not the code
// here it would round to the next valid code address by
// using lower bound operation
uint32_t getIndexForAddr(uint64_t Address) const {
uint64_t Offset = virtualAddrToOffset(Address);
auto Low = std::lower_bound(CodeAddrs.begin(), CodeAddrs.end(), Offset);
return Low - CodeAddrs.begin();
}
uint64_t getCallAddrFromFrameAddr(uint64_t FrameAddr) const {
return getAddressforIndex(getIndexForAddr(FrameAddr) - 1);
}
StringRef getFuncFromStartOffset(uint64_t Offset) {
return FuncStartAddrMap[Offset];
}
const FrameLocation &getInlineLeafFrameLoc(uint64_t Offset,
bool NameOnly = false) {
return getFrameLocationStack(Offset).back();
}
// Compare two addresses' inline context
bool inlineContextEqual(uint64_t Add1, uint64_t Add2) const;
// Get the context string of the current stack with inline context filled in.
// It will search the disassembling info stored in Offset2LocStackMap. This is
// used as the key of function sample map
std::string getExpandedContextStr(const std::list<uint64_t> &stack) const;
};
} // end namespace sampleprof
} // end namespace llvm
#endif