1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-23 03:02:36 +01:00
llvm-mirror/include/llvm/Analysis/CodeMetrics.h
Florian Hahn d596025713 [LoopRotate] Add PrepareForLTO stage, avoid rotating with inline cands.
D84108 exposed a bad interaction between inlining and loop-rotation
during regular LTO, which is causing notable regressions in at least
CINT2006/473.astar.

The problem boils down to: we now rotate a loop just before the vectorizer
which requires duplicating a function call in the preheader when compiling
the individual files ('prepare for LTO'). But this then prevents further
inlining of the function during LTO.

This patch tries to resolve this issue by making LoopRotate more
conservative with respect to rotating loops that have inline-able calls
during the 'prepare for LTO' stage.

I think this change intuitively improves the current situation in
general. Loop-rotate tries hard to avoid creating headers that are 'too
big'. At the moment, it assumes all inlining already happened and the
cost of duplicating a call is equal to just doing the call. But with LTO,
inlining also happens during full LTO and it is possible that a previously
duplicated call is actually a huge function which gets inlined
during LTO.

From the perspective of LV, not much should change overall. Most loops
calling user-provided functions won't get vectorized to start with
(unless we can infer that the function does not touch memory, has no
other side effects). If we do not inline the 'inline-able' call during
the LTO stage, we merely delayed loop-rotation & vectorization. If we
inline during LTO, chances should be very high that the inlined code is
itself vectorizable or the user call was not vectorizable to start with.

There could of course be scenarios where we inline a sufficiently large
function with code not profitable to vectorize, which would have be
vectorized earlier (by scalarzing the call). But even in that case,
there probably is no big performance impact, because it should be mostly
down to the cost-model to reject vectorization in that case. And then
the version with scalarized calls should also not be beneficial. In a way,
LV should have strictly more information after inlining and make more
accurate decisions (barring cost-model issues).

There is of course plenty of room for things to go wrong unexpectedly,
so we need to keep a close look at actual performance and address any
follow-up issues.

I took a look at the impact on statistics for
MultiSource/SPEC2000/SPEC2006. There are a few benchmarks with fewer
loops rotated, but no change to the number of loops vectorized.

Reviewed By: sanwou01

Differential Revision: https://reviews.llvm.org/D94232
2021-01-19 10:15:29 +00:00

95 lines
3.2 KiB
C++

//===- CodeMetrics.h - Code cost measurements -------------------*- C++ -*-===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
//
// This file implements various weight measurements for code, helping
// the Inliner and other passes decide whether to duplicate its contents.
//
//===----------------------------------------------------------------------===//
#ifndef LLVM_ANALYSIS_CODEMETRICS_H
#define LLVM_ANALYSIS_CODEMETRICS_H
#include "llvm/ADT/DenseMap.h"
namespace llvm {
class AssumptionCache;
class BasicBlock;
class Loop;
class Function;
template <class T> class SmallPtrSetImpl;
class TargetTransformInfo;
class Value;
/// Utility to calculate the size and a few similar metrics for a set
/// of basic blocks.
struct CodeMetrics {
/// True if this function contains a call to setjmp or other functions
/// with attribute "returns twice" without having the attribute itself.
bool exposesReturnsTwice = false;
/// True if this function calls itself.
bool isRecursive = false;
/// True if this function cannot be duplicated.
///
/// True if this function contains one or more indirect branches, or it contains
/// one or more 'noduplicate' instructions.
bool notDuplicatable = false;
/// True if this function contains a call to a convergent function.
bool convergent = false;
/// True if this function calls alloca (in the C sense).
bool usesDynamicAlloca = false;
/// Number of instructions in the analyzed blocks.
unsigned NumInsts = false;
/// Number of analyzed blocks.
unsigned NumBlocks = false;
/// Keeps track of basic block code size estimates.
DenseMap<const BasicBlock *, unsigned> NumBBInsts;
/// Keep track of the number of calls to 'big' functions.
unsigned NumCalls = false;
/// The number of calls to internal functions with a single caller.
///
/// These are likely targets for future inlining, likely exposed by
/// interleaved devirtualization.
unsigned NumInlineCandidates = 0;
/// How many instructions produce vector values.
///
/// The inliner is more aggressive with inlining vector kernels.
unsigned NumVectorInsts = 0;
/// How many 'ret' instructions the blocks contain.
unsigned NumRets = 0;
/// Add information about a block to the current state.
void analyzeBasicBlock(const BasicBlock *BB, const TargetTransformInfo &TTI,
const SmallPtrSetImpl<const Value *> &EphValues,
bool PrepareForLTO = false);
/// Collect a loop's ephemeral values (those used only by an assume
/// or similar intrinsics in the loop).
static void collectEphemeralValues(const Loop *L, AssumptionCache *AC,
SmallPtrSetImpl<const Value *> &EphValues);
/// Collect a functions's ephemeral values (those used only by an
/// assume or similar intrinsics in the function).
static void collectEphemeralValues(const Function *L, AssumptionCache *AC,
SmallPtrSetImpl<const Value *> &EphValues);
};
}
#endif