1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-18 18:42:46 +02:00
llvm-mirror/tools/llvm-xray/xray-account.h
Roman Lebedev 2e1aaf1aa5 [XRay] Account: recursion detection
Summary:
Recursion detection can be non-trivial. Currently, the state-of-the-art for LLVM,
as far as i'm concerned, is D72362 `[clang-tidy] misc-no-recursion: a new check`.
However, it is quite limited:
* It does very basic call-graph based analysis, in the sense it will report even dynamically-unreachable recursion.
* It is inherently limited to a single TU
* It is hard to gauge how problematic each recursion is in practice.

Some of that can be addressed by adding clang analyzer-based check,
then it would at least support multiple TU's.

However, we can approach this problem from another angle - dynamic run-time analysis.
We already have means to capture a run-time callgraph (XRay, duh),
and there are already means to reconstruct it within `llvm-xray` tool.

This proposes to add a `-recursive-calls-only` switch to the `account` tool.
When the switch is on, when re-constructing callgraph for latency reconstruction,
each time we enter/leave some function, we increment/decrement an entry for the function
in a "recursion depth" map. If, when we leave the function, said entry was at `1`,
then that means the function didn't call itself, however if it is at `2` or more,
then that means the function (possibly indirectly) called itself.

If the depth is 1, we don't account the time spent there,
unless within this call stack the function already recursed into itself.
Note that we don't pay for recursion depth tracking when `recursive-calls-only` is not on,
and the perf impact is insignificant (+0.3% regression)

The overhead of the option is actually negative, around -5.26% user time on a medium-sized (3.5G) XRay log.
As a practical example, that 3.5G log is a capture of the entire middle-end opt pipeline
at `-O3` for RawSpeed unity build. There are total of `5500` functions in the log,
however `-recursive-calls-only` says that `269`, or 5%, are recursive.

Having this functionality could be helpful for recursion eradication.

Reviewers: dberris, mboerger

Reviewed By: dberris

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D84582
2020-07-27 10:15:44 +03:00

116 lines
3.8 KiB
C++

//===- xray-account.h - XRay Function Call Accounting ---------------------===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
//
// This file defines the interface for performing some basic function call
// accounting from an XRay trace.
//
//===----------------------------------------------------------------------===//
#ifndef LLVM_TOOLS_LLVM_XRAY_XRAY_ACCOUNT_H
#define LLVM_TOOLS_LLVM_XRAY_XRAY_ACCOUNT_H
#include <map>
#include <utility>
#include <vector>
#include "func-id-helper.h"
#include "llvm/ADT/Bitfields.h"
#include "llvm/Support/Program.h"
#include "llvm/Support/raw_ostream.h"
#include "llvm/XRay/XRayRecord.h"
namespace llvm {
namespace xray {
class LatencyAccountant {
public:
typedef llvm::DenseMap<int32_t, llvm::SmallVector<uint64_t, 0>>
FunctionLatencyMap;
typedef llvm::DenseMap<uint32_t, std::pair<uint64_t, uint64_t>>
PerThreadMinMaxTSCMap;
typedef llvm::DenseMap<uint8_t, std::pair<uint64_t, uint64_t>>
PerCPUMinMaxTSCMap;
struct FunctionStack {
llvm::SmallVector<std::pair<int32_t, uint64_t>, 32> Stack;
class RecursionStatus {
uint32_t Storage = 0;
using Depth = Bitfield::Element<int32_t, 0, 31>; // Low 31 bits.
using IsRecursive = Bitfield::Element<bool, 31, 1>; // Sign bit.
public:
RecursionStatus &operator++();
RecursionStatus &operator--();
bool isRecursive() const;
};
Optional<llvm::DenseMap<int32_t, RecursionStatus>> RecursionDepth;
};
typedef llvm::DenseMap<uint32_t, FunctionStack> PerThreadFunctionStackMap;
private:
PerThreadFunctionStackMap PerThreadFunctionStack;
FunctionLatencyMap FunctionLatencies;
PerThreadMinMaxTSCMap PerThreadMinMaxTSC;
PerCPUMinMaxTSCMap PerCPUMinMaxTSC;
FuncIdConversionHelper &FuncIdHelper;
bool RecursiveCallsOnly = false;
bool DeduceSiblingCalls = false;
uint64_t CurrentMaxTSC = 0;
void recordLatency(int32_t FuncId, uint64_t Latency) {
FunctionLatencies[FuncId].push_back(Latency);
}
public:
explicit LatencyAccountant(FuncIdConversionHelper &FuncIdHelper,
bool RecursiveCallsOnly, bool DeduceSiblingCalls)
: FuncIdHelper(FuncIdHelper), RecursiveCallsOnly(RecursiveCallsOnly),
DeduceSiblingCalls(DeduceSiblingCalls) {}
const FunctionLatencyMap &getFunctionLatencies() const {
return FunctionLatencies;
}
const PerThreadMinMaxTSCMap &getPerThreadMinMaxTSC() const {
return PerThreadMinMaxTSC;
}
const PerCPUMinMaxTSCMap &getPerCPUMinMaxTSC() const {
return PerCPUMinMaxTSC;
}
/// Returns false in case we fail to account the provided record. This happens
/// in the following cases:
///
/// - An exit record does not match any entry records for the same function.
/// If we've been set to deduce sibling calls, we try walking up the stack
/// and recording times for the higher level functions.
/// - A record has a TSC that's before the latest TSC that has been
/// recorded. We still record the TSC for the min-max.
///
bool accountRecord(const XRayRecord &Record);
const PerThreadFunctionStackMap &getPerThreadFunctionStack() const {
return PerThreadFunctionStack;
}
// Output Functions
// ================
void exportStatsAsText(raw_ostream &OS, const XRayFileHeader &Header) const;
void exportStatsAsCSV(raw_ostream &OS, const XRayFileHeader &Header) const;
private:
// Internal helper to implement common parts of the exportStatsAs...
// functions.
template <class F> void exportStats(const XRayFileHeader &Header, F fn) const;
};
} // namespace xray
} // namespace llvm
#endif // LLVM_TOOLS_LLVM_XRAY_XRAY_ACCOUNT_H