mirror of
https://github.com/RPCS3/llvm-mirror.git
synced 2024-11-22 02:33:06 +01:00
[MCA] Add support for in-order CPUs
This patch adds a pipeline to support in-order CPUs such as ARM Cortex-A55. In-order pipeline implements a simplified version of Dispatch, Scheduler and Execute stages as a single stage. Entry and Retire stages are common for both in-order and out-of-order pipelines. Differential Revision: https://reviews.llvm.org/D94928
This commit is contained in:
parent
848f45b02f
commit
064cc1a22c
@ -16,8 +16,8 @@ available in LLVM (e.g. scheduling models) to statically measure the performance
|
||||
of machine code in a specific CPU.
|
||||
|
||||
Performance is measured in terms of throughput as well as processor resource
|
||||
consumption. The tool currently works for processors with an out-of-order
|
||||
backend, for which there is a scheduling model available in LLVM.
|
||||
consumption. The tool currently works for processors with a backend for which
|
||||
there is a scheduling model available in LLVM.
|
||||
|
||||
The main goal of this tool is not just to predict the performance of the code
|
||||
when run on the target, but also help with diagnosing potential performance
|
||||
@ -204,7 +204,8 @@ option specifies "``-``", then the output will also be sent to standard output.
|
||||
|
||||
Print information about bottlenecks that affect the throughput. This analysis
|
||||
can be expensive, and it is disabled by default. Bottlenecks are highlighted
|
||||
in the summary view.
|
||||
in the summary view. Bottleneck analysis is currently not supported for
|
||||
processors with an in-order backend.
|
||||
|
||||
.. option:: -json
|
||||
|
||||
@ -388,7 +389,9 @@ overview of the performance throughput. Important performance indicators are
|
||||
Throughput).
|
||||
|
||||
Field *DispatchWidth* is the maximum number of micro opcodes that are dispatched
|
||||
to the out-of-order backend every simulated cycle.
|
||||
to the out-of-order backend every simulated cycle. For processors with an
|
||||
in-order backend, *DispatchWidth* is the maximum number of micro opcodes issued
|
||||
to the backend every simulated cycle.
|
||||
|
||||
IPC is computed dividing the total number of simulated instructions by the total
|
||||
number of cycles.
|
||||
@ -653,6 +656,8 @@ performance. By construction, the accuracy of this analysis is strongly
|
||||
dependent on the simulation and (as always) by the quality of the processor
|
||||
model in llvm.
|
||||
|
||||
Bottleneck analysis is currently not supported for processors with an in-order
|
||||
backend.
|
||||
|
||||
Extra Statistics to Further Diagnose Performance Issues
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
@ -797,11 +802,14 @@ process instructions.
|
||||
* Write Back (Instruction is executed, and results are written back).
|
||||
* Retire (Instruction is retired; writes are architecturally committed).
|
||||
|
||||
The default pipeline only models the out-of-order portion of a processor.
|
||||
Therefore, the instruction fetch and decode stages are not modeled. Performance
|
||||
bottlenecks in the frontend are not diagnosed. :program:`llvm-mca` assumes that
|
||||
instructions have all been decoded and placed into a queue before the simulation
|
||||
start. Also, :program:`llvm-mca` does not model branch prediction.
|
||||
The in-order pipeline implements the following sequence of stages:
|
||||
* InOrderIssue (Instruction is issued to the processor pipelines).
|
||||
* Retire (Instruction is retired; writes are architecturally committed).
|
||||
|
||||
:program:`llvm-mca` assumes that instructions have all been decoded and placed
|
||||
into a queue before the simulation start. Therefore, the instruction fetch and
|
||||
decode stages are not modeled. Performance bottlenecks in the frontend are not
|
||||
diagnosed. Also, :program:`llvm-mca` does not model branch prediction.
|
||||
|
||||
Instruction Dispatch
|
||||
""""""""""""""""""""
|
||||
@ -957,3 +965,17 @@ In conclusion, the full set of load/store consistency rules are:
|
||||
#. A load may pass a previous load.
|
||||
#. A load may not pass a previous store unless ``-noalias`` is set.
|
||||
#. A load has to wait until an older load barrier is fully executed.
|
||||
|
||||
In-order Issue and Execute
|
||||
""""""""""""""""""""""""""""""""""""
|
||||
In-order processors are modelled as a single ``InOrderIssueStage`` stage. It
|
||||
bypasses Dispatch, Scheduler and Load/Store unit. Instructions are issued as
|
||||
soon as their operand registers are available and resource requirements are
|
||||
met. Multiple instructions can be issued in one cycle according to the value of
|
||||
the ``IssueWidth`` parameter in LLVM's scheduling model.
|
||||
|
||||
Once issued, an instruction is moved to ``IssuedInst`` set until it is ready to
|
||||
retire. If ``RetireControlUnit`` is defined in the LLVM's scheduling model,
|
||||
:program:`llvm-mca` ensures that instructions are retired in-order. However, an
|
||||
instruction is allowed to retire out-of-order if ``RetireOOO`` property is true
|
||||
for at least one of its writes.
|
||||
|
@ -130,6 +130,9 @@ Changes to the LLVM tools
|
||||
* The options ``--build-id-link-{dir,input,output}`` have been deleted.
|
||||
(`D96310 <https://reviews.llvm.org/D96310>`_)
|
||||
|
||||
* Support for in-order processors has been added to ``llvm-mca``.
|
||||
(`D94928 <https://reviews.llvm.org/D94928>`_)
|
||||
|
||||
Changes to LLDB
|
||||
---------------------------------
|
||||
|
||||
|
@ -108,15 +108,16 @@ struct MCReadAdvanceEntry {
|
||||
///
|
||||
/// Defined as an aggregate struct for creating tables with initializer lists.
|
||||
struct MCSchedClassDesc {
|
||||
static const unsigned short InvalidNumMicroOps = (1U << 14) - 1;
|
||||
static const unsigned short InvalidNumMicroOps = (1U << 13) - 1;
|
||||
static const unsigned short VariantNumMicroOps = InvalidNumMicroOps - 1;
|
||||
|
||||
#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
|
||||
const char* Name;
|
||||
#endif
|
||||
uint16_t NumMicroOps : 14;
|
||||
uint16_t NumMicroOps : 13;
|
||||
uint16_t BeginGroup : 1;
|
||||
uint16_t EndGroup : 1;
|
||||
uint16_t RetireOOO : 1;
|
||||
uint16_t WriteProcResIdx; // First index into WriteProcResTable.
|
||||
uint16_t NumWriteProcResEntries;
|
||||
uint16_t WriteLatencyIdx; // First index into WriteLatencyTable.
|
||||
|
@ -68,6 +68,11 @@ public:
|
||||
/// This pipeline consists of Fetch, Dispatch, Execute, and Retire stages.
|
||||
std::unique_ptr<Pipeline> createDefaultPipeline(const PipelineOptions &Opts,
|
||||
SourceMgr &SrcMgr);
|
||||
|
||||
/// Construct a basic pipeline for simulating an in-order pipeline.
|
||||
/// This pipeline consists of Fetch, InOrderIssue, and Retire stages.
|
||||
std::unique_ptr<Pipeline> createInOrderPipeline(const PipelineOptions &Opts,
|
||||
SourceMgr &SrcMgr);
|
||||
};
|
||||
|
||||
} // namespace mca
|
||||
|
@ -172,11 +172,6 @@ class RegisterFile : public HardwareUnit {
|
||||
void freePhysRegs(const RegisterRenamingInfo &Entry,
|
||||
MutableArrayRef<unsigned> FreedPhysRegs);
|
||||
|
||||
// Collects writes that are in a RAW dependency with RS.
|
||||
// This method is called from `addRegisterRead()`.
|
||||
void collectWrites(const ReadState &RS,
|
||||
SmallVectorImpl<WriteRef> &Writes) const;
|
||||
|
||||
// Create an instance of RegisterMappingTracker for every register file
|
||||
// specified by the processor model.
|
||||
// If no register file is specified, then this method creates a default
|
||||
@ -187,6 +182,10 @@ public:
|
||||
RegisterFile(const MCSchedModel &SM, const MCRegisterInfo &mri,
|
||||
unsigned NumRegs = 0);
|
||||
|
||||
// Collects writes that are in a RAW dependency with RS.
|
||||
void collectWrites(const ReadState &RS,
|
||||
SmallVectorImpl<WriteRef> &Writes) const;
|
||||
|
||||
// This method updates the register mappings inserting a new register
|
||||
// definition. This method is also responsible for updating the number of
|
||||
// allocated physical registers in each register file modified by the write.
|
||||
|
@ -104,6 +104,9 @@ public:
|
||||
#ifndef NDEBUG
|
||||
void dump() const;
|
||||
#endif
|
||||
|
||||
// Assigned to instructions that are not handled by the RCU.
|
||||
static const unsigned UnhandledTokenID = ~0U;
|
||||
};
|
||||
|
||||
} // namespace mca
|
||||
|
@ -375,6 +375,7 @@ struct InstrDesc {
|
||||
bool HasSideEffects;
|
||||
bool BeginGroup;
|
||||
bool EndGroup;
|
||||
bool RetireOOO;
|
||||
|
||||
// True if all buffered resources are in-order, and there is at least one
|
||||
// buffer which is a dispatch hazard (BufferSize = 0).
|
||||
|
84
include/llvm/MCA/Stages/InOrderIssueStage.h
Normal file
84
include/llvm/MCA/Stages/InOrderIssueStage.h
Normal file
@ -0,0 +1,84 @@
|
||||
//===---------------------- InOrderIssueStage.h -----------------*- C++ -*-===//
|
||||
//
|
||||
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
|
||||
// See https://llvm.org/LICENSE.txt for license information.
|
||||
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
|
||||
//
|
||||
//===----------------------------------------------------------------------===//
|
||||
/// \file
|
||||
///
|
||||
/// InOrderIssueStage implements an in-order execution pipeline.
|
||||
///
|
||||
//===----------------------------------------------------------------------===//
|
||||
|
||||
#ifndef LLVM_MCA_IN_ORDER_ISSUE_STAGE_H
|
||||
#define LLVM_MCA_IN_ORDER_ISSUE_STAGE_H
|
||||
|
||||
#include "llvm/ADT/SmallVector.h"
|
||||
#include "llvm/MCA/SourceMgr.h"
|
||||
#include "llvm/MCA/Stages/Stage.h"
|
||||
|
||||
#include <queue>
|
||||
|
||||
namespace llvm {
|
||||
struct MCSchedModel;
|
||||
class MCSubtargetInfo;
|
||||
|
||||
namespace mca {
|
||||
class RegisterFile;
|
||||
class ResourceManager;
|
||||
struct RetireControlUnit;
|
||||
|
||||
class InOrderIssueStage final : public Stage {
|
||||
const MCSchedModel &SM;
|
||||
const MCSubtargetInfo &STI;
|
||||
RetireControlUnit &RCU;
|
||||
RegisterFile &PRF;
|
||||
std::unique_ptr<ResourceManager> RM;
|
||||
|
||||
/// Instructions that were issued, but not executed yet.
|
||||
SmallVector<InstRef, 4> IssuedInst;
|
||||
|
||||
/// Number of instructions issued in the current cycle.
|
||||
unsigned NumIssued;
|
||||
|
||||
/// If an instruction cannot execute due to an unmet register or resource
|
||||
/// dependency, the it is stalled for StallCyclesLeft.
|
||||
InstRef StalledInst;
|
||||
unsigned StallCyclesLeft;
|
||||
|
||||
/// Number of instructions that can be issued in the current cycle.
|
||||
unsigned Bandwidth;
|
||||
|
||||
InOrderIssueStage(const InOrderIssueStage &Other) = delete;
|
||||
InOrderIssueStage &operator=(const InOrderIssueStage &Other) = delete;
|
||||
|
||||
/// If IR has an unmet register or resource dependency, canExecute returns
|
||||
/// false. StallCycles is set to the number of cycles left before the
|
||||
/// instruction can be issued.
|
||||
bool canExecute(const InstRef &IR, unsigned *StallCycles) const;
|
||||
|
||||
/// Issue the instruction, or update StallCycles if IR is stalled.
|
||||
Error tryIssue(InstRef &IR, unsigned *StallCycles);
|
||||
|
||||
/// Update status of instructions from IssuedInst.
|
||||
Error updateIssuedInst();
|
||||
|
||||
public:
|
||||
InOrderIssueStage(RetireControlUnit &RCU, RegisterFile &PRF,
|
||||
const MCSchedModel &SM, const MCSubtargetInfo &STI)
|
||||
: SM(SM), STI(STI), RCU(RCU), PRF(PRF),
|
||||
RM(std::make_unique<ResourceManager>(SM)), StallCyclesLeft(0),
|
||||
Bandwidth(0) {}
|
||||
|
||||
bool isAvailable(const InstRef &) const override;
|
||||
bool hasWorkToComplete() const override;
|
||||
Error execute(InstRef &IR) override;
|
||||
Error cycleStart() override;
|
||||
Error cycleEnd() override;
|
||||
};
|
||||
|
||||
} // namespace mca
|
||||
} // namespace llvm
|
||||
|
||||
#endif // LLVM_MCA_IN_ORDER_ISSUE_STAGE_H
|
@ -16,6 +16,7 @@
|
||||
#ifndef LLVM_MCA_STAGES_RETIRESTAGE_H
|
||||
#define LLVM_MCA_STAGES_RETIRESTAGE_H
|
||||
|
||||
#include "llvm/ADT/SmallVector.h"
|
||||
#include "llvm/MCA/HardwareUnits/LSUnit.h"
|
||||
#include "llvm/MCA/HardwareUnits/RegisterFile.h"
|
||||
#include "llvm/MCA/HardwareUnits/RetireControlUnit.h"
|
||||
@ -29,6 +30,7 @@ class RetireStage final : public Stage {
|
||||
RetireControlUnit &RCU;
|
||||
RegisterFile &PRF;
|
||||
LSUnitBase &LSU;
|
||||
SmallVector<InstRef, 4> RetireInst;
|
||||
|
||||
RetireStage(const RetireStage &Other) = delete;
|
||||
RetireStage &operator=(const RetireStage &Other) = delete;
|
||||
@ -37,7 +39,9 @@ public:
|
||||
RetireStage(RetireControlUnit &R, RegisterFile &F, LSUnitBase &LS)
|
||||
: Stage(), RCU(R), PRF(F), LSU(LS) {}
|
||||
|
||||
bool hasWorkToComplete() const override { return !RCU.isEmpty(); }
|
||||
bool hasWorkToComplete() const override {
|
||||
return !RCU.isEmpty() || !RetireInst.empty();
|
||||
}
|
||||
Error cycleStart() override;
|
||||
Error execute(InstRef &IR) override;
|
||||
void notifyInstructionRetired(const InstRef &IR) const;
|
||||
|
@ -262,6 +262,10 @@ class ProcWriteResources<list<ProcResourceKind> resources> {
|
||||
// Allow a processor to mark some scheduling classes as single-issue.
|
||||
// SingleIssue is an alias for Begin/End Group.
|
||||
bit SingleIssue = false;
|
||||
// An instruction is allowed to retire out-of-order if RetireOOO is
|
||||
// true for at least one of its writes. This field is only used by
|
||||
// MCA for in-order subtargets, and is ignored for other targets.
|
||||
bit RetireOOO = false;
|
||||
SchedMachineModel SchedModel = ?;
|
||||
}
|
||||
|
||||
|
@ -14,6 +14,7 @@ add_llvm_component_library(LLVMMCA
|
||||
Stages/DispatchStage.cpp
|
||||
Stages/EntryStage.cpp
|
||||
Stages/ExecuteStage.cpp
|
||||
Stages/InOrderIssueStage.cpp
|
||||
Stages/InstructionTables.cpp
|
||||
Stages/MicroOpQueueStage.cpp
|
||||
Stages/RetireStage.cpp
|
||||
|
@ -21,6 +21,7 @@
|
||||
#include "llvm/MCA/Stages/DispatchStage.h"
|
||||
#include "llvm/MCA/Stages/EntryStage.h"
|
||||
#include "llvm/MCA/Stages/ExecuteStage.h"
|
||||
#include "llvm/MCA/Stages/InOrderIssueStage.h"
|
||||
#include "llvm/MCA/Stages/MicroOpQueueStage.h"
|
||||
#include "llvm/MCA/Stages/RetireStage.h"
|
||||
|
||||
@ -31,6 +32,9 @@ std::unique_ptr<Pipeline>
|
||||
Context::createDefaultPipeline(const PipelineOptions &Opts, SourceMgr &SrcMgr) {
|
||||
const MCSchedModel &SM = STI.getSchedModel();
|
||||
|
||||
if (!SM.isOutOfOrder())
|
||||
return createInOrderPipeline(Opts, SrcMgr);
|
||||
|
||||
// Create the hardware units defining the backend.
|
||||
auto RCU = std::make_unique<RetireControlUnit>(SM);
|
||||
auto PRF = std::make_unique<RegisterFile>(SM, MRI, Opts.RegisterFileSize);
|
||||
@ -64,5 +68,29 @@ Context::createDefaultPipeline(const PipelineOptions &Opts, SourceMgr &SrcMgr) {
|
||||
return StagePipeline;
|
||||
}
|
||||
|
||||
std::unique_ptr<Pipeline>
|
||||
Context::createInOrderPipeline(const PipelineOptions &Opts, SourceMgr &SrcMgr) {
|
||||
const MCSchedModel &SM = STI.getSchedModel();
|
||||
auto RCU = std::make_unique<RetireControlUnit>(SM);
|
||||
auto PRF = std::make_unique<RegisterFile>(SM, MRI, Opts.RegisterFileSize);
|
||||
auto LSU = std::make_unique<LSUnit>(SM, Opts.LoadQueueSize,
|
||||
Opts.StoreQueueSize, Opts.AssumeNoAlias);
|
||||
|
||||
auto Entry = std::make_unique<EntryStage>(SrcMgr);
|
||||
auto InOrderIssue = std::make_unique<InOrderIssueStage>(*RCU, *PRF, SM, STI);
|
||||
auto Retire = std::make_unique<RetireStage>(*RCU, *PRF, *LSU);
|
||||
|
||||
auto StagePipeline = std::make_unique<Pipeline>();
|
||||
StagePipeline->appendStage(std::move(Entry));
|
||||
StagePipeline->appendStage(std::move(InOrderIssue));
|
||||
StagePipeline->appendStage(std::move(Retire));
|
||||
|
||||
addHardwareUnit(std::move(RCU));
|
||||
addHardwareUnit(std::move(PRF));
|
||||
addHardwareUnit(std::move(LSU));
|
||||
|
||||
return StagePipeline;
|
||||
}
|
||||
|
||||
} // namespace mca
|
||||
} // namespace llvm
|
||||
|
@ -33,12 +33,18 @@ RetireControlUnit::RetireControlUnit(const MCSchedModel &SM)
|
||||
MaxRetirePerCycle = EPI.MaxRetirePerCycle;
|
||||
}
|
||||
NumROBEntries = AvailableEntries;
|
||||
bool IsOutOfOrder = SM.MicroOpBufferSize;
|
||||
if (!IsOutOfOrder && !NumROBEntries)
|
||||
return;
|
||||
assert(NumROBEntries && "Invalid reorder buffer size!");
|
||||
Queue.resize(2 * NumROBEntries);
|
||||
}
|
||||
|
||||
// Reserves a number of slots, and returns a new token.
|
||||
unsigned RetireControlUnit::dispatch(const InstRef &IR) {
|
||||
if (!NumROBEntries)
|
||||
return UnhandledTokenID;
|
||||
|
||||
const Instruction &Inst = *IR.getInstruction();
|
||||
unsigned Entries = normalizeQuantity(Inst.getNumMicroOps());
|
||||
assert((AvailableEntries >= Entries) && "Reorder Buffer unavailable!");
|
||||
@ -47,6 +53,7 @@ unsigned RetireControlUnit::dispatch(const InstRef &IR) {
|
||||
Queue[NextAvailableSlotIdx] = {IR, Entries, false};
|
||||
NextAvailableSlotIdx += std::max(1U, Entries);
|
||||
NextAvailableSlotIdx %= Queue.size();
|
||||
assert(TokenID < UnhandledTokenID && "Invalid token ID");
|
||||
|
||||
AvailableEntries -= Entries;
|
||||
return TokenID;
|
||||
|
@ -570,6 +570,7 @@ InstrBuilder::createInstrDescImpl(const MCInst &MCI) {
|
||||
ID->HasSideEffects = MCDesc.hasUnmodeledSideEffects();
|
||||
ID->BeginGroup = SCDesc.BeginGroup;
|
||||
ID->EndGroup = SCDesc.EndGroup;
|
||||
ID->RetireOOO = SCDesc.RetireOOO;
|
||||
|
||||
initializeUsedResources(*ID, SCDesc, STI, ProcResourceMasks);
|
||||
computeMaxLatency(*ID, MCDesc, SCDesc, STI);
|
||||
|
292
lib/MCA/Stages/InOrderIssueStage.cpp
Normal file
292
lib/MCA/Stages/InOrderIssueStage.cpp
Normal file
@ -0,0 +1,292 @@
|
||||
//===---------------------- InOrderIssueStage.cpp ---------------*- C++ -*-===//
|
||||
//
|
||||
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
|
||||
// See https://llvm.org/LICENSE.txt for license information.
|
||||
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
|
||||
//
|
||||
//===----------------------------------------------------------------------===//
|
||||
/// \file
|
||||
///
|
||||
/// InOrderIssueStage implements an in-order execution pipeline.
|
||||
///
|
||||
//===----------------------------------------------------------------------===//
|
||||
|
||||
#include "llvm/MCA/Stages/InOrderIssueStage.h"
|
||||
|
||||
#include "llvm/MC/MCSchedule.h"
|
||||
#include "llvm/MCA/HWEventListener.h"
|
||||
#include "llvm/MCA/HardwareUnits/RegisterFile.h"
|
||||
#include "llvm/MCA/HardwareUnits/ResourceManager.h"
|
||||
#include "llvm/MCA/HardwareUnits/RetireControlUnit.h"
|
||||
#include "llvm/MCA/Instruction.h"
|
||||
#include "llvm/Support/Debug.h"
|
||||
#include "llvm/Support/Error.h"
|
||||
|
||||
#include <algorithm>
|
||||
|
||||
#define DEBUG_TYPE "llvm-mca"
|
||||
namespace llvm {
|
||||
namespace mca {
|
||||
|
||||
bool InOrderIssueStage::hasWorkToComplete() const {
|
||||
return !IssuedInst.empty() || StalledInst;
|
||||
}
|
||||
|
||||
bool InOrderIssueStage::isAvailable(const InstRef &IR) const {
|
||||
const Instruction &Inst = *IR.getInstruction();
|
||||
unsigned NumMicroOps = Inst.getNumMicroOps();
|
||||
const InstrDesc &Desc = Inst.getDesc();
|
||||
|
||||
if (Bandwidth < NumMicroOps)
|
||||
return false;
|
||||
|
||||
// Instruction with BeginGroup must be the first instruction to be issued in a
|
||||
// cycle.
|
||||
if (Desc.BeginGroup && NumIssued != 0)
|
||||
return false;
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
static bool hasResourceHazard(const ResourceManager &RM, const InstRef &IR) {
|
||||
if (RM.checkAvailability(IR.getInstruction()->getDesc())) {
|
||||
LLVM_DEBUG(dbgs() << "[E] Stall #" << IR << '\n');
|
||||
return true;
|
||||
}
|
||||
|
||||
return false;
|
||||
}
|
||||
|
||||
/// Return a number of cycles left until register requirements of the
|
||||
/// instructions are met.
|
||||
static unsigned checkRegisterHazard(const RegisterFile &PRF,
|
||||
const MCSchedModel &SM,
|
||||
const MCSubtargetInfo &STI,
|
||||
const InstRef &IR) {
|
||||
unsigned StallCycles = 0;
|
||||
SmallVector<WriteRef, 4> Writes;
|
||||
|
||||
for (const ReadState &RS : IR.getInstruction()->getUses()) {
|
||||
const ReadDescriptor &RD = RS.getDescriptor();
|
||||
const MCSchedClassDesc *SC = SM.getSchedClassDesc(RD.SchedClassID);
|
||||
|
||||
PRF.collectWrites(RS, Writes);
|
||||
for (const WriteRef &WR : Writes) {
|
||||
const WriteState *WS = WR.getWriteState();
|
||||
unsigned WriteResID = WS->getWriteResourceID();
|
||||
int ReadAdvance = STI.getReadAdvanceCycles(SC, RD.UseIndex, WriteResID);
|
||||
LLVM_DEBUG(dbgs() << "[E] ReadAdvance for #" << IR << ": " << ReadAdvance
|
||||
<< '\n');
|
||||
|
||||
if (WS->getCyclesLeft() == UNKNOWN_CYCLES) {
|
||||
// Try again in the next cycle until the value is known
|
||||
StallCycles = std::max(StallCycles, 1U);
|
||||
continue;
|
||||
}
|
||||
|
||||
int CyclesLeft = WS->getCyclesLeft() - ReadAdvance;
|
||||
if (CyclesLeft > 0) {
|
||||
LLVM_DEBUG(dbgs() << "[E] Register hazard: " << WS->getRegisterID()
|
||||
<< '\n');
|
||||
StallCycles = std::max(StallCycles, (unsigned)CyclesLeft);
|
||||
}
|
||||
}
|
||||
Writes.clear();
|
||||
}
|
||||
|
||||
return StallCycles;
|
||||
}
|
||||
|
||||
bool InOrderIssueStage::canExecute(const InstRef &IR,
|
||||
unsigned *StallCycles) const {
|
||||
*StallCycles = 0;
|
||||
|
||||
if (unsigned RegStall = checkRegisterHazard(PRF, SM, STI, IR)) {
|
||||
*StallCycles = RegStall;
|
||||
// FIXME: add a parameter to HWStallEvent to indicate a number of cycles.
|
||||
for (unsigned I = 0; I < RegStall; ++I) {
|
||||
notifyEvent<HWStallEvent>(
|
||||
HWStallEvent(HWStallEvent::RegisterFileStall, IR));
|
||||
notifyEvent<HWPressureEvent>(
|
||||
HWPressureEvent(HWPressureEvent::REGISTER_DEPS, IR));
|
||||
}
|
||||
} else if (hasResourceHazard(*RM, IR)) {
|
||||
*StallCycles = 1;
|
||||
notifyEvent<HWStallEvent>(
|
||||
HWStallEvent(HWStallEvent::DispatchGroupStall, IR));
|
||||
notifyEvent<HWPressureEvent>(
|
||||
HWPressureEvent(HWPressureEvent::RESOURCES, IR));
|
||||
}
|
||||
|
||||
return *StallCycles == 0;
|
||||
}
|
||||
|
||||
static void addRegisterReadWrite(RegisterFile &PRF, Instruction &IS,
|
||||
unsigned SourceIndex,
|
||||
const MCSubtargetInfo &STI,
|
||||
SmallVectorImpl<unsigned> &UsedRegs) {
|
||||
assert(!IS.isEliminated());
|
||||
|
||||
for (ReadState &RS : IS.getUses())
|
||||
PRF.addRegisterRead(RS, STI);
|
||||
|
||||
for (WriteState &WS : IS.getDefs())
|
||||
PRF.addRegisterWrite(WriteRef(SourceIndex, &WS), UsedRegs);
|
||||
}
|
||||
|
||||
static void notifyInstructionExecute(
|
||||
const InstRef &IR,
|
||||
const SmallVectorImpl<std::pair<ResourceRef, ResourceCycles>> &UsedRes,
|
||||
const Stage &S) {
|
||||
|
||||
S.notifyEvent<HWInstructionEvent>(
|
||||
HWInstructionEvent(HWInstructionEvent::Ready, IR));
|
||||
S.notifyEvent<HWInstructionEvent>(HWInstructionIssuedEvent(IR, UsedRes));
|
||||
|
||||
LLVM_DEBUG(dbgs() << "[E] Issued #" << IR << "\n");
|
||||
}
|
||||
|
||||
static void notifyInstructionDispatch(const InstRef &IR, unsigned Ops,
|
||||
const SmallVectorImpl<unsigned> &UsedRegs,
|
||||
const Stage &S) {
|
||||
|
||||
S.notifyEvent<HWInstructionEvent>(
|
||||
HWInstructionDispatchedEvent(IR, UsedRegs, Ops));
|
||||
|
||||
LLVM_DEBUG(dbgs() << "[E] Dispatched #" << IR << "\n");
|
||||
}
|
||||
|
||||
llvm::Error InOrderIssueStage::execute(InstRef &IR) {
|
||||
Instruction &IS = *IR.getInstruction();
|
||||
const InstrDesc &Desc = IS.getDesc();
|
||||
|
||||
unsigned RCUTokenID = RetireControlUnit::UnhandledTokenID;
|
||||
if (!Desc.RetireOOO)
|
||||
RCUTokenID = RCU.dispatch(IR);
|
||||
IS.dispatch(RCUTokenID);
|
||||
|
||||
if (Desc.EndGroup) {
|
||||
Bandwidth = 0;
|
||||
} else {
|
||||
unsigned NumMicroOps = IR.getInstruction()->getNumMicroOps();
|
||||
assert(Bandwidth >= NumMicroOps);
|
||||
Bandwidth -= NumMicroOps;
|
||||
}
|
||||
|
||||
if (llvm::Error E = tryIssue(IR, &StallCyclesLeft))
|
||||
return E;
|
||||
|
||||
if (StallCyclesLeft) {
|
||||
StalledInst = IR;
|
||||
Bandwidth = 0;
|
||||
}
|
||||
|
||||
return llvm::ErrorSuccess();
|
||||
}
|
||||
|
||||
llvm::Error InOrderIssueStage::tryIssue(InstRef &IR, unsigned *StallCycles) {
|
||||
Instruction &IS = *IR.getInstruction();
|
||||
unsigned SourceIndex = IR.getSourceIndex();
|
||||
|
||||
if (!canExecute(IR, StallCycles)) {
|
||||
LLVM_DEBUG(dbgs() << "[E] Stalled #" << IR << " for " << *StallCycles
|
||||
<< " cycles\n");
|
||||
return llvm::ErrorSuccess();
|
||||
}
|
||||
|
||||
SmallVector<unsigned, 4> UsedRegs(PRF.getNumRegisterFiles());
|
||||
addRegisterReadWrite(PRF, IS, SourceIndex, STI, UsedRegs);
|
||||
|
||||
notifyInstructionDispatch(IR, IS.getDesc().NumMicroOps, UsedRegs, *this);
|
||||
|
||||
SmallVector<std::pair<ResourceRef, ResourceCycles>, 4> UsedResources;
|
||||
RM->issueInstruction(IS.getDesc(), UsedResources);
|
||||
IS.execute(SourceIndex);
|
||||
|
||||
// Replace resource masks with valid resource processor IDs.
|
||||
for (std::pair<ResourceRef, ResourceCycles> &Use : UsedResources) {
|
||||
uint64_t Mask = Use.first.first;
|
||||
Use.first.first = RM->resolveResourceMask(Mask);
|
||||
}
|
||||
notifyInstructionExecute(IR, UsedResources, *this);
|
||||
|
||||
IssuedInst.push_back(IR);
|
||||
++NumIssued;
|
||||
|
||||
return llvm::ErrorSuccess();
|
||||
}
|
||||
|
||||
llvm::Error InOrderIssueStage::updateIssuedInst() {
|
||||
// Update other instructions. Executed instructions will be retired during the
|
||||
// next cycle.
|
||||
unsigned NumExecuted = 0;
|
||||
for (auto I = IssuedInst.begin(), E = IssuedInst.end();
|
||||
I != (E - NumExecuted);) {
|
||||
InstRef &IR = *I;
|
||||
Instruction &IS = *IR.getInstruction();
|
||||
|
||||
IS.cycleEvent();
|
||||
if (!IS.isExecuted()) {
|
||||
LLVM_DEBUG(dbgs() << "[E] Instruction #" << IR
|
||||
<< " is still executing\n");
|
||||
++I;
|
||||
continue;
|
||||
}
|
||||
notifyEvent<HWInstructionEvent>(
|
||||
HWInstructionEvent(HWInstructionEvent::Executed, IR));
|
||||
|
||||
LLVM_DEBUG(dbgs() << "[E] Instruction #" << IR << " is executed\n");
|
||||
++NumExecuted;
|
||||
std::iter_swap(I, E - NumExecuted);
|
||||
}
|
||||
|
||||
// Retire instructions in the next cycle
|
||||
if (NumExecuted) {
|
||||
for (auto I = IssuedInst.end() - NumExecuted, E = IssuedInst.end(); I != E;
|
||||
++I) {
|
||||
if (llvm::Error E = moveToTheNextStage(*I))
|
||||
return E;
|
||||
}
|
||||
IssuedInst.resize(IssuedInst.size() - NumExecuted);
|
||||
}
|
||||
|
||||
return llvm::ErrorSuccess();
|
||||
}
|
||||
|
||||
llvm::Error InOrderIssueStage::cycleStart() {
|
||||
NumIssued = 0;
|
||||
|
||||
// Release consumed resources.
|
||||
SmallVector<ResourceRef, 4> Freed;
|
||||
RM->cycleEvent(Freed);
|
||||
|
||||
if (llvm::Error E = updateIssuedInst())
|
||||
return E;
|
||||
|
||||
// Issue instructions scheduled for this cycle
|
||||
if (!StallCyclesLeft && StalledInst) {
|
||||
if (llvm::Error E = tryIssue(StalledInst, &StallCyclesLeft))
|
||||
return E;
|
||||
}
|
||||
|
||||
if (!StallCyclesLeft) {
|
||||
StalledInst.invalidate();
|
||||
assert(NumIssued <= SM.IssueWidth && "Overflow.");
|
||||
Bandwidth = SM.IssueWidth - NumIssued;
|
||||
} else {
|
||||
// The instruction is still stalled, cannot issue any new instructions in
|
||||
// this cycle.
|
||||
Bandwidth = 0;
|
||||
}
|
||||
|
||||
return llvm::ErrorSuccess();
|
||||
}
|
||||
|
||||
llvm::Error InOrderIssueStage::cycleEnd() {
|
||||
if (StallCyclesLeft > 0)
|
||||
--StallCyclesLeft;
|
||||
return llvm::ErrorSuccess();
|
||||
}
|
||||
|
||||
} // namespace mca
|
||||
} // namespace llvm
|
@ -23,9 +23,6 @@ namespace llvm {
|
||||
namespace mca {
|
||||
|
||||
llvm::Error RetireStage::cycleStart() {
|
||||
if (RCU.isEmpty())
|
||||
return llvm::ErrorSuccess();
|
||||
|
||||
const unsigned MaxRetirePerCycle = RCU.getMaxRetirePerCycle();
|
||||
unsigned NumRetired = 0;
|
||||
while (!RCU.isEmpty()) {
|
||||
@ -39,11 +36,26 @@ llvm::Error RetireStage::cycleStart() {
|
||||
NumRetired++;
|
||||
}
|
||||
|
||||
// Retire instructions that are not controlled by the RCU
|
||||
for (InstRef &IR : RetireInst) {
|
||||
IR.getInstruction()->retire();
|
||||
notifyInstructionRetired(IR);
|
||||
}
|
||||
RetireInst.resize(0);
|
||||
|
||||
return llvm::ErrorSuccess();
|
||||
}
|
||||
|
||||
llvm::Error RetireStage::execute(InstRef &IR) {
|
||||
RCU.onInstructionExecuted(IR.getInstruction()->getRCUTokenID());
|
||||
Instruction &IS = *IR.getInstruction();
|
||||
|
||||
unsigned TokenID = IS.getRCUTokenID();
|
||||
if (TokenID != RetireControlUnit::UnhandledTokenID) {
|
||||
RCU.onInstructionExecuted(TokenID);
|
||||
return llvm::ErrorSuccess();
|
||||
}
|
||||
|
||||
RetireInst.push_back(IR);
|
||||
return llvm::ErrorSuccess();
|
||||
}
|
||||
|
||||
|
@ -151,6 +151,8 @@ def CortexA55WriteFPALU_F5 : SchedWriteRes<[CortexA55UnitFPALU]> { let Latency =
|
||||
|
||||
// FP Mul, Div, Sqrt. Div/Sqrt are not pipelined
|
||||
def : WriteRes<WriteFMul, [CortexA55UnitFPMAC]> { let Latency = 4; }
|
||||
|
||||
let RetireOOO = 1 in {
|
||||
def : WriteRes<WriteFDiv, [CortexA55UnitFPDIV]> { let Latency = 22;
|
||||
let ResourceCycles = [29]; }
|
||||
def CortexA55WriteFMAC : SchedWriteRes<[CortexA55UnitFPMAC]> { let Latency = 4; }
|
||||
@ -166,7 +168,7 @@ def CortexA55WriteFSqrtSP : SchedWriteRes<[CortexA55UnitFPDIV]> { let Latency =
|
||||
let ResourceCycles = [9]; }
|
||||
def CortexA55WriteFSqrtDP : SchedWriteRes<[CortexA55UnitFPDIV]> { let Latency = 22;
|
||||
let ResourceCycles = [19]; }
|
||||
|
||||
}
|
||||
//===----------------------------------------------------------------------===//
|
||||
// Subtarget-specific SchedRead types.
|
||||
|
||||
@ -336,4 +338,6 @@ def : InstRW<[CortexA55WriteFDivDP], (instregex "^FDIVv.*64$")>;
|
||||
def : InstRW<[CortexA55WriteFSqrtHP], (instregex "^.*SQRT.*16$")>;
|
||||
def : InstRW<[CortexA55WriteFSqrtSP], (instregex "^.*SQRT.*32$")>;
|
||||
def : InstRW<[CortexA55WriteFSqrtDP], (instregex "^.*SQRT.*64$")>;
|
||||
|
||||
def A55RCU : RetireControlUnit<64, 0>;
|
||||
}
|
||||
|
@ -19,7 +19,7 @@ let CompleteModel = 0 in {
|
||||
// Inst_B didn't have the resoures, and it is invalid.
|
||||
// CHECK: SchedModel_ASchedClasses[] = {
|
||||
// CHECK: {DBGFIELD("Inst_A") 1
|
||||
// CHECK-NEXT: {DBGFIELD("Inst_B") 16383
|
||||
// CHECK-NEXT: {DBGFIELD("Inst_B") 8191
|
||||
let SchedModel = SchedModel_A in {
|
||||
def Write_A : SchedWriteRes<[]>;
|
||||
def : InstRW<[Write_A], (instrs Inst_A)>;
|
||||
@ -27,7 +27,7 @@ let SchedModel = SchedModel_A in {
|
||||
|
||||
// Inst_A didn't have the resoures, and it is invalid.
|
||||
// CHECK: SchedModel_BSchedClasses[] = {
|
||||
// CHECK: {DBGFIELD("Inst_A") 16383
|
||||
// CHECK: {DBGFIELD("Inst_A") 8191
|
||||
// CHECK-NEXT: {DBGFIELD("Inst_B") 1
|
||||
let SchedModel = SchedModel_B in {
|
||||
def Write_B: SchedWriteRes<[]>;
|
||||
|
81
test/tools/llvm-mca/AArch64/Cortex/A55-add-sequence.s
Normal file
81
test/tools/llvm-mca/AArch64/Cortex/A55-add-sequence.s
Normal file
@ -0,0 +1,81 @@
|
||||
# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
|
||||
# RUN: llvm-mca -mtriple=aarch64 -mcpu=cortex-a55 --timeline --iterations=2 < %s | FileCheck %s
|
||||
|
||||
add w2, w3, #1
|
||||
add w4, w3, #2, lsl #12
|
||||
add w0, w4, #3
|
||||
add w1, w0, #4
|
||||
|
||||
# CHECK: Iterations: 2
|
||||
# CHECK-NEXT: Instructions: 8
|
||||
# CHECK-NEXT: Total Cycles: 10
|
||||
# CHECK-NEXT: Total uOps: 8
|
||||
|
||||
# CHECK: Dispatch Width: 2
|
||||
# CHECK-NEXT: uOps Per Cycle: 0.80
|
||||
# CHECK-NEXT: IPC: 0.80
|
||||
# CHECK-NEXT: Block RThroughput: 2.0
|
||||
|
||||
# CHECK: Instruction Info:
|
||||
# CHECK-NEXT: [1]: #uOps
|
||||
# CHECK-NEXT: [2]: Latency
|
||||
# CHECK-NEXT: [3]: RThroughput
|
||||
# CHECK-NEXT: [4]: MayLoad
|
||||
# CHECK-NEXT: [5]: MayStore
|
||||
# CHECK-NEXT: [6]: HasSideEffects (U)
|
||||
|
||||
# CHECK: [1] [2] [3] [4] [5] [6] Instructions:
|
||||
# CHECK-NEXT: 1 3 0.50 add w2, w3, #1
|
||||
# CHECK-NEXT: 1 3 0.50 add w4, w3, #2, lsl #12
|
||||
# CHECK-NEXT: 1 3 0.50 add w0, w4, #3
|
||||
# CHECK-NEXT: 1 3 0.50 add w1, w0, #4
|
||||
|
||||
# CHECK: Resources:
|
||||
# CHECK-NEXT: [0.0] - CortexA55UnitALU
|
||||
# CHECK-NEXT: [0.1] - CortexA55UnitALU
|
||||
# CHECK-NEXT: [1] - CortexA55UnitB
|
||||
# CHECK-NEXT: [2] - CortexA55UnitDiv
|
||||
# CHECK-NEXT: [3.0] - CortexA55UnitFPALU
|
||||
# CHECK-NEXT: [3.1] - CortexA55UnitFPALU
|
||||
# CHECK-NEXT: [4] - CortexA55UnitFPDIV
|
||||
# CHECK-NEXT: [5.0] - CortexA55UnitFPMAC
|
||||
# CHECK-NEXT: [5.1] - CortexA55UnitFPMAC
|
||||
# CHECK-NEXT: [6] - CortexA55UnitLd
|
||||
# CHECK-NEXT: [7] - CortexA55UnitMAC
|
||||
# CHECK-NEXT: [8] - CortexA55UnitSt
|
||||
|
||||
# CHECK: Resource pressure per iteration:
|
||||
# CHECK-NEXT: [0.0] [0.1] [1] [2] [3.0] [3.1] [4] [5.0] [5.1] [6] [7] [8]
|
||||
# CHECK-NEXT: 2.00 2.00 - - - - - - - - - -
|
||||
|
||||
# CHECK: Resource pressure by instruction:
|
||||
# CHECK-NEXT: [0.0] [0.1] [1] [2] [3.0] [3.1] [4] [5.0] [5.1] [6] [7] [8] Instructions:
|
||||
# CHECK-NEXT: - 1.00 - - - - - - - - - - add w2, w3, #1
|
||||
# CHECK-NEXT: 1.00 - - - - - - - - - - - add w4, w3, #2, lsl #12
|
||||
# CHECK-NEXT: - 1.00 - - - - - - - - - - add w0, w4, #3
|
||||
# CHECK-NEXT: 1.00 - - - - - - - - - - - add w1, w0, #4
|
||||
|
||||
# CHECK: Timeline view:
|
||||
# CHECK-NEXT: Index 0123456789
|
||||
|
||||
# CHECK: [0,0] DeeER. . add w2, w3, #1
|
||||
# CHECK-NEXT: [0,1] DeeER. . add w4, w3, #2, lsl #12
|
||||
# CHECK-NEXT: [0,2] .DeeER . add w0, w4, #3
|
||||
# CHECK-NEXT: [0,3] . DeeER . add w1, w0, #4
|
||||
# CHECK-NEXT: [1,0] . DeeER . add w2, w3, #1
|
||||
# CHECK-NEXT: [1,1] . DeeER . add w4, w3, #2, lsl #12
|
||||
# CHECK-NEXT: [1,2] . DeeER. add w0, w4, #3
|
||||
# CHECK-NEXT: [1,3] . DeeER add w1, w0, #4
|
||||
|
||||
# CHECK: Average Wait times (based on the timeline view):
|
||||
# CHECK-NEXT: [0]: Executions
|
||||
# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue
|
||||
# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready
|
||||
# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage
|
||||
|
||||
# CHECK: [0] [1] [2] [3]
|
||||
# CHECK-NEXT: 0. 2 0.0 0.0 0.0 add w2, w3, #1
|
||||
# CHECK-NEXT: 1. 2 0.0 0.0 0.0 add w4, w3, #2, lsl #12
|
||||
# CHECK-NEXT: 2. 2 0.0 0.0 0.0 add w0, w4, #3
|
||||
# CHECK-NEXT: 3. 2 0.0 0.0 0.0 add w1, w0, #4
|
||||
# CHECK-NEXT: 2 0.0 0.0 0.0 <total>
|
100
test/tools/llvm-mca/AArch64/Cortex/A55-all-stats.s
Normal file
100
test/tools/llvm-mca/AArch64/Cortex/A55-all-stats.s
Normal file
@ -0,0 +1,100 @@
|
||||
# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
|
||||
# RUN: llvm-mca -mtriple=aarch64 -mcpu=cortex-a55 --all-stats --iterations=2 < %s | FileCheck %s
|
||||
|
||||
ldr w4, [x2], #4
|
||||
ldr w5, [x3]
|
||||
madd w0, w5, w4, w0
|
||||
add x3, x3, x13
|
||||
subs x1, x1, #1
|
||||
str w0, [x21, x18, lsl #2]
|
||||
|
||||
# CHECK: Iterations: 2
|
||||
# CHECK-NEXT: Instructions: 12
|
||||
# CHECK-NEXT: Total Cycles: 21
|
||||
# CHECK-NEXT: Total uOps: 14
|
||||
|
||||
# CHECK: Dispatch Width: 2
|
||||
# CHECK-NEXT: uOps Per Cycle: 0.67
|
||||
# CHECK-NEXT: IPC: 0.57
|
||||
# CHECK-NEXT: Block RThroughput: 3.5
|
||||
|
||||
# CHECK: Instruction Info:
|
||||
# CHECK-NEXT: [1]: #uOps
|
||||
# CHECK-NEXT: [2]: Latency
|
||||
# CHECK-NEXT: [3]: RThroughput
|
||||
# CHECK-NEXT: [4]: MayLoad
|
||||
# CHECK-NEXT: [5]: MayStore
|
||||
# CHECK-NEXT: [6]: HasSideEffects (U)
|
||||
|
||||
# CHECK: [1] [2] [3] [4] [5] [6] Instructions:
|
||||
# CHECK-NEXT: 2 3 1.00 * ldr w4, [x2], #4
|
||||
# CHECK-NEXT: 1 3 1.00 * ldr w5, [x3]
|
||||
# CHECK-NEXT: 1 4 1.00 madd w0, w5, w4, w0
|
||||
# CHECK-NEXT: 1 3 0.50 add x3, x3, x13
|
||||
# CHECK-NEXT: 1 3 0.50 subs x1, x1, #1
|
||||
# CHECK-NEXT: 1 4 1.00 * str w0, [x21, x18, lsl #2]
|
||||
|
||||
# CHECK: Dynamic Dispatch Stall Cycles:
|
||||
# CHECK-NEXT: RAT - Register unavailable: 10 (47.6%)
|
||||
# CHECK-NEXT: RCU - Retire tokens unavailable: 0
|
||||
# CHECK-NEXT: SCHEDQ - Scheduler full: 0
|
||||
# CHECK-NEXT: LQ - Load queue full: 0
|
||||
# CHECK-NEXT: SQ - Store queue full: 0
|
||||
# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0
|
||||
|
||||
# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
|
||||
# CHECK-NEXT: [# dispatched], [# cycles]
|
||||
# CHECK-NEXT: 0, 11 (52.4%)
|
||||
# CHECK-NEXT: 1, 6 (28.6%)
|
||||
# CHECK-NEXT: 2, 4 (19.0%)
|
||||
|
||||
# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:
|
||||
# CHECK-NEXT: [# issued], [# cycles]
|
||||
# CHECK-NEXT: 0, 11 (52.4%)
|
||||
# CHECK-NEXT: 1, 6 (28.6%)
|
||||
# CHECK-NEXT: 2, 4 (19.0%)
|
||||
|
||||
# CHECK: Scheduler's queue usage:
|
||||
# CHECK-NEXT: No scheduler resources used.
|
||||
|
||||
# CHECK: Retire Control Unit - number of cycles where we saw N instructions retired:
|
||||
# CHECK-NEXT: [# retired], [# cycles]
|
||||
# CHECK-NEXT: 0, 14 (66.7%)
|
||||
# CHECK-NEXT: 1, 4 (19.0%)
|
||||
# CHECK-NEXT: 2, 1 (4.8%)
|
||||
# CHECK-NEXT: 3, 2 (9.5%)
|
||||
|
||||
# CHECK: Total ROB Entries: 64
|
||||
# CHECK-NEXT: Max Used ROB Entries: 6 ( 9.4% )
|
||||
# CHECK-NEXT: Average Used ROB Entries per cy: 2 ( 3.1% )
|
||||
|
||||
# CHECK: Register File statistics:
|
||||
# CHECK-NEXT: Total number of mappings created: 14
|
||||
# CHECK-NEXT: Max number of mappings used: 6
|
||||
|
||||
# CHECK: Resources:
|
||||
# CHECK-NEXT: [0.0] - CortexA55UnitALU
|
||||
# CHECK-NEXT: [0.1] - CortexA55UnitALU
|
||||
# CHECK-NEXT: [1] - CortexA55UnitB
|
||||
# CHECK-NEXT: [2] - CortexA55UnitDiv
|
||||
# CHECK-NEXT: [3.0] - CortexA55UnitFPALU
|
||||
# CHECK-NEXT: [3.1] - CortexA55UnitFPALU
|
||||
# CHECK-NEXT: [4] - CortexA55UnitFPDIV
|
||||
# CHECK-NEXT: [5.0] - CortexA55UnitFPMAC
|
||||
# CHECK-NEXT: [5.1] - CortexA55UnitFPMAC
|
||||
# CHECK-NEXT: [6] - CortexA55UnitLd
|
||||
# CHECK-NEXT: [7] - CortexA55UnitMAC
|
||||
# CHECK-NEXT: [8] - CortexA55UnitSt
|
||||
|
||||
# CHECK: Resource pressure per iteration:
|
||||
# CHECK-NEXT: [0.0] [0.1] [1] [2] [3.0] [3.1] [4] [5.0] [5.1] [6] [7] [8]
|
||||
# CHECK-NEXT: 1.00 1.00 - - - - - - - 2.00 1.00 1.00
|
||||
|
||||
# CHECK: Resource pressure by instruction:
|
||||
# CHECK-NEXT: [0.0] [0.1] [1] [2] [3.0] [3.1] [4] [5.0] [5.1] [6] [7] [8] Instructions:
|
||||
# CHECK-NEXT: - - - - - - - - - 1.00 - - ldr w4, [x2], #4
|
||||
# CHECK-NEXT: - - - - - - - - - 1.00 - - ldr w5, [x3]
|
||||
# CHECK-NEXT: - - - - - - - - - - 1.00 - madd w0, w5, w4, w0
|
||||
# CHECK-NEXT: - 1.00 - - - - - - - - - - add x3, x3, x13
|
||||
# CHECK-NEXT: 1.00 - - - - - - - - - - - subs x1, x1, #1
|
||||
# CHECK-NEXT: - - - - - - - - - - - 1.00 str w0, [x21, x18, lsl #2]
|
132
test/tools/llvm-mca/AArch64/Cortex/A55-all-views.s
Normal file
132
test/tools/llvm-mca/AArch64/Cortex/A55-all-views.s
Normal file
@ -0,0 +1,132 @@
|
||||
# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
|
||||
# RUN: llvm-mca -mtriple=aarch64 -mcpu=cortex-a55 --all-views --iterations=2 < %s | FileCheck %s
|
||||
|
||||
ldr w4, [x2], #4
|
||||
ldr w5, [x3]
|
||||
madd w0, w5, w4, w0
|
||||
add x3, x3, x13
|
||||
subs x1, x1, #1
|
||||
str w0, [x21, x18, lsl #2]
|
||||
|
||||
# CHECK: Iterations: 2
|
||||
# CHECK-NEXT: Instructions: 12
|
||||
# CHECK-NEXT: Total Cycles: 21
|
||||
# CHECK-NEXT: Total uOps: 14
|
||||
|
||||
# CHECK: Dispatch Width: 2
|
||||
# CHECK-NEXT: uOps Per Cycle: 0.67
|
||||
# CHECK-NEXT: IPC: 0.57
|
||||
# CHECK-NEXT: Block RThroughput: 3.5
|
||||
|
||||
# CHECK: Instruction Info:
|
||||
# CHECK-NEXT: [1]: #uOps
|
||||
# CHECK-NEXT: [2]: Latency
|
||||
# CHECK-NEXT: [3]: RThroughput
|
||||
# CHECK-NEXT: [4]: MayLoad
|
||||
# CHECK-NEXT: [5]: MayStore
|
||||
# CHECK-NEXT: [6]: HasSideEffects (U)
|
||||
|
||||
# CHECK: [1] [2] [3] [4] [5] [6] Instructions:
|
||||
# CHECK-NEXT: 2 3 1.00 * ldr w4, [x2], #4
|
||||
# CHECK-NEXT: 1 3 1.00 * ldr w5, [x3]
|
||||
# CHECK-NEXT: 1 4 1.00 madd w0, w5, w4, w0
|
||||
# CHECK-NEXT: 1 3 0.50 add x3, x3, x13
|
||||
# CHECK-NEXT: 1 3 0.50 subs x1, x1, #1
|
||||
# CHECK-NEXT: 1 4 1.00 * str w0, [x21, x18, lsl #2]
|
||||
|
||||
# CHECK: Dynamic Dispatch Stall Cycles:
|
||||
# CHECK-NEXT: RAT - Register unavailable: 10 (47.6%)
|
||||
# CHECK-NEXT: RCU - Retire tokens unavailable: 0
|
||||
# CHECK-NEXT: SCHEDQ - Scheduler full: 0
|
||||
# CHECK-NEXT: LQ - Load queue full: 0
|
||||
# CHECK-NEXT: SQ - Store queue full: 0
|
||||
# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 0
|
||||
|
||||
# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
|
||||
# CHECK-NEXT: [# dispatched], [# cycles]
|
||||
# CHECK-NEXT: 0, 11 (52.4%)
|
||||
# CHECK-NEXT: 1, 6 (28.6%)
|
||||
# CHECK-NEXT: 2, 4 (19.0%)
|
||||
|
||||
# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:
|
||||
# CHECK-NEXT: [# issued], [# cycles]
|
||||
# CHECK-NEXT: 0, 11 (52.4%)
|
||||
# CHECK-NEXT: 1, 6 (28.6%)
|
||||
# CHECK-NEXT: 2, 4 (19.0%)
|
||||
|
||||
# CHECK: Scheduler's queue usage:
|
||||
# CHECK-NEXT: No scheduler resources used.
|
||||
|
||||
# CHECK: Retire Control Unit - number of cycles where we saw N instructions retired:
|
||||
# CHECK-NEXT: [# retired], [# cycles]
|
||||
# CHECK-NEXT: 0, 14 (66.7%)
|
||||
# CHECK-NEXT: 1, 4 (19.0%)
|
||||
# CHECK-NEXT: 2, 1 (4.8%)
|
||||
# CHECK-NEXT: 3, 2 (9.5%)
|
||||
|
||||
# CHECK: Total ROB Entries: 64
|
||||
# CHECK-NEXT: Max Used ROB Entries: 6 ( 9.4% )
|
||||
# CHECK-NEXT: Average Used ROB Entries per cy: 2 ( 3.1% )
|
||||
|
||||
# CHECK: Register File statistics:
|
||||
# CHECK-NEXT: Total number of mappings created: 14
|
||||
# CHECK-NEXT: Max number of mappings used: 6
|
||||
|
||||
# CHECK: Resources:
|
||||
# CHECK-NEXT: [0.0] - CortexA55UnitALU
|
||||
# CHECK-NEXT: [0.1] - CortexA55UnitALU
|
||||
# CHECK-NEXT: [1] - CortexA55UnitB
|
||||
# CHECK-NEXT: [2] - CortexA55UnitDiv
|
||||
# CHECK-NEXT: [3.0] - CortexA55UnitFPALU
|
||||
# CHECK-NEXT: [3.1] - CortexA55UnitFPALU
|
||||
# CHECK-NEXT: [4] - CortexA55UnitFPDIV
|
||||
# CHECK-NEXT: [5.0] - CortexA55UnitFPMAC
|
||||
# CHECK-NEXT: [5.1] - CortexA55UnitFPMAC
|
||||
# CHECK-NEXT: [6] - CortexA55UnitLd
|
||||
# CHECK-NEXT: [7] - CortexA55UnitMAC
|
||||
# CHECK-NEXT: [8] - CortexA55UnitSt
|
||||
|
||||
# CHECK: Resource pressure per iteration:
|
||||
# CHECK-NEXT: [0.0] [0.1] [1] [2] [3.0] [3.1] [4] [5.0] [5.1] [6] [7] [8]
|
||||
# CHECK-NEXT: 1.00 1.00 - - - - - - - 2.00 1.00 1.00
|
||||
|
||||
# CHECK: Resource pressure by instruction:
|
||||
# CHECK-NEXT: [0.0] [0.1] [1] [2] [3.0] [3.1] [4] [5.0] [5.1] [6] [7] [8] Instructions:
|
||||
# CHECK-NEXT: - - - - - - - - - 1.00 - - ldr w4, [x2], #4
|
||||
# CHECK-NEXT: - - - - - - - - - 1.00 - - ldr w5, [x3]
|
||||
# CHECK-NEXT: - - - - - - - - - - 1.00 - madd w0, w5, w4, w0
|
||||
# CHECK-NEXT: - 1.00 - - - - - - - - - - add x3, x3, x13
|
||||
# CHECK-NEXT: 1.00 - - - - - - - - - - - subs x1, x1, #1
|
||||
# CHECK-NEXT: - - - - - - - - - - - 1.00 str w0, [x21, x18, lsl #2]
|
||||
|
||||
# CHECK: Timeline view:
|
||||
# CHECK-NEXT: 0123456789
|
||||
# CHECK-NEXT: Index 0123456789 0
|
||||
|
||||
# CHECK: [0,0] DeeER. . . . ldr w4, [x2], #4
|
||||
# CHECK-NEXT: [0,1] .DeeER . . . ldr w5, [x3]
|
||||
# CHECK-NEXT: [0,2] . DeeeER. . . madd w0, w5, w4, w0
|
||||
# CHECK-NEXT: [0,3] . DeeE-R. . . add x3, x3, x13
|
||||
# CHECK-NEXT: [0,4] . DeeER. . . subs x1, x1, #1
|
||||
# CHECK-NEXT: [0,5] . . DeeeER . . str w0, [x21, x18, lsl #2]
|
||||
# CHECK-NEXT: [1,0] . . DeeER . . ldr w4, [x2], #4
|
||||
# CHECK-NEXT: [1,1] . . DeeER . . ldr w5, [x3]
|
||||
# CHECK-NEXT: [1,2] . . . DeeeER . madd w0, w5, w4, w0
|
||||
# CHECK-NEXT: [1,3] . . . DeeE-R . add x3, x3, x13
|
||||
# CHECK-NEXT: [1,4] . . . DeeER . subs x1, x1, #1
|
||||
# CHECK-NEXT: [1,5] . . . DeeeER str w0, [x21, x18, lsl #2]
|
||||
|
||||
# CHECK: Average Wait times (based on the timeline view):
|
||||
# CHECK-NEXT: [0]: Executions
|
||||
# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue
|
||||
# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready
|
||||
# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage
|
||||
|
||||
# CHECK: [0] [1] [2] [3]
|
||||
# CHECK-NEXT: 0. 2 0.0 0.0 0.0 ldr w4, [x2], #4
|
||||
# CHECK-NEXT: 1. 2 0.0 0.0 0.0 ldr w5, [x3]
|
||||
# CHECK-NEXT: 2. 2 0.0 0.0 0.0 madd w0, w5, w4, w0
|
||||
# CHECK-NEXT: 3. 2 0.0 0.0 1.0 add x3, x3, x13
|
||||
# CHECK-NEXT: 4. 2 0.0 0.0 0.0 subs x1, x1, #1
|
||||
# CHECK-NEXT: 5. 2 0.0 0.0 0.0 str w0, [x21, x18, lsl #2]
|
||||
# CHECK-NEXT: 2 0.0 0.0 0.2 <total>
|
128
test/tools/llvm-mca/AArch64/Cortex/A55-in-order-retire.s
Normal file
128
test/tools/llvm-mca/AArch64/Cortex/A55-in-order-retire.s
Normal file
@ -0,0 +1,128 @@
|
||||
# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
|
||||
# RUN: llvm-mca -mtriple=aarch64 -mcpu=cortex-a55 --all-stats --all-views --iterations=2 < %s | FileCheck %s
|
||||
|
||||
sdiv w12, w21, w0
|
||||
add w8, w8, #1
|
||||
add w1, w2, w0
|
||||
add w3, w4, #1
|
||||
add w5, w6, w0
|
||||
add w7, w9, w0
|
||||
|
||||
# CHECK: Iterations: 2
|
||||
# CHECK-NEXT: Instructions: 12
|
||||
# CHECK-NEXT: Total Cycles: 18
|
||||
# CHECK-NEXT: Total uOps: 12
|
||||
|
||||
# CHECK: Dispatch Width: 2
|
||||
# CHECK-NEXT: uOps Per Cycle: 0.67
|
||||
# CHECK-NEXT: IPC: 0.67
|
||||
# CHECK-NEXT: Block RThroughput: 8.0
|
||||
|
||||
# CHECK: Instruction Info:
|
||||
# CHECK-NEXT: [1]: #uOps
|
||||
# CHECK-NEXT: [2]: Latency
|
||||
# CHECK-NEXT: [3]: RThroughput
|
||||
# CHECK-NEXT: [4]: MayLoad
|
||||
# CHECK-NEXT: [5]: MayStore
|
||||
# CHECK-NEXT: [6]: HasSideEffects (U)
|
||||
|
||||
# CHECK: [1] [2] [3] [4] [5] [6] Instructions:
|
||||
# CHECK-NEXT: 1 8 8.00 sdiv w12, w21, w0
|
||||
# CHECK-NEXT: 1 3 0.50 add w8, w8, #1
|
||||
# CHECK-NEXT: 1 3 0.50 add w1, w2, w0
|
||||
# CHECK-NEXT: 1 3 0.50 add w3, w4, #1
|
||||
# CHECK-NEXT: 1 3 0.50 add w5, w6, w0
|
||||
# CHECK-NEXT: 1 3 0.50 add w7, w9, w0
|
||||
|
||||
# CHECK: Dynamic Dispatch Stall Cycles:
|
||||
# CHECK-NEXT: RAT - Register unavailable: 0
|
||||
# CHECK-NEXT: RCU - Retire tokens unavailable: 0
|
||||
# CHECK-NEXT: SCHEDQ - Scheduler full: 0
|
||||
# CHECK-NEXT: LQ - Load queue full: 0
|
||||
# CHECK-NEXT: SQ - Store queue full: 0
|
||||
# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 5 (27.8%)
|
||||
|
||||
# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
|
||||
# CHECK-NEXT: [# dispatched], [# cycles]
|
||||
# CHECK-NEXT: 0, 12 (66.7%)
|
||||
# CHECK-NEXT: 2, 6 (33.3%)
|
||||
|
||||
# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:
|
||||
# CHECK-NEXT: [# issued], [# cycles]
|
||||
# CHECK-NEXT: 0, 12 (66.7%)
|
||||
# CHECK-NEXT: 2, 6 (33.3%)
|
||||
|
||||
# CHECK: Scheduler's queue usage:
|
||||
# CHECK-NEXT: No scheduler resources used.
|
||||
|
||||
# CHECK: Retire Control Unit - number of cycles where we saw N instructions retired:
|
||||
# CHECK-NEXT: [# retired], [# cycles]
|
||||
# CHECK-NEXT: 0, 16 (88.9%)
|
||||
# CHECK-NEXT: 6, 2 (11.1%)
|
||||
|
||||
# CHECK: Total ROB Entries: 64
|
||||
# CHECK-NEXT: Max Used ROB Entries: 8 ( 12.5% )
|
||||
# CHECK-NEXT: Average Used ROB Entries per cy: 5 ( 7.8% )
|
||||
|
||||
# CHECK: Register File statistics:
|
||||
# CHECK-NEXT: Total number of mappings created: 12
|
||||
# CHECK-NEXT: Max number of mappings used: 8
|
||||
|
||||
# CHECK: Resources:
|
||||
# CHECK-NEXT: [0.0] - CortexA55UnitALU
|
||||
# CHECK-NEXT: [0.1] - CortexA55UnitALU
|
||||
# CHECK-NEXT: [1] - CortexA55UnitB
|
||||
# CHECK-NEXT: [2] - CortexA55UnitDiv
|
||||
# CHECK-NEXT: [3.0] - CortexA55UnitFPALU
|
||||
# CHECK-NEXT: [3.1] - CortexA55UnitFPALU
|
||||
# CHECK-NEXT: [4] - CortexA55UnitFPDIV
|
||||
# CHECK-NEXT: [5.0] - CortexA55UnitFPMAC
|
||||
# CHECK-NEXT: [5.1] - CortexA55UnitFPMAC
|
||||
# CHECK-NEXT: [6] - CortexA55UnitLd
|
||||
# CHECK-NEXT: [7] - CortexA55UnitMAC
|
||||
# CHECK-NEXT: [8] - CortexA55UnitSt
|
||||
|
||||
# CHECK: Resource pressure per iteration:
|
||||
# CHECK-NEXT: [0.0] [0.1] [1] [2] [3.0] [3.1] [4] [5.0] [5.1] [6] [7] [8]
|
||||
# CHECK-NEXT: 2.50 2.50 - 8.00 - - - - - - - -
|
||||
|
||||
# CHECK: Resource pressure by instruction:
|
||||
# CHECK-NEXT: [0.0] [0.1] [1] [2] [3.0] [3.1] [4] [5.0] [5.1] [6] [7] [8] Instructions:
|
||||
# CHECK-NEXT: - - - 8.00 - - - - - - - - sdiv w12, w21, w0
|
||||
# CHECK-NEXT: 0.50 0.50 - - - - - - - - - - add w8, w8, #1
|
||||
# CHECK-NEXT: 0.50 0.50 - - - - - - - - - - add w1, w2, w0
|
||||
# CHECK-NEXT: 0.50 0.50 - - - - - - - - - - add w3, w4, #1
|
||||
# CHECK-NEXT: 0.50 0.50 - - - - - - - - - - add w5, w6, w0
|
||||
# CHECK-NEXT: 0.50 0.50 - - - - - - - - - - add w7, w9, w0
|
||||
|
||||
# CHECK: Timeline view:
|
||||
# CHECK-NEXT: 01234567
|
||||
# CHECK-NEXT: Index 0123456789
|
||||
|
||||
# CHECK: [0,0] DeeeeeeeER. . . sdiv w12, w21, w0
|
||||
# CHECK-NEXT: [0,1] DeeE-----R. . . add w8, w8, #1
|
||||
# CHECK-NEXT: [0,2] .DeeE----R. . . add w1, w2, w0
|
||||
# CHECK-NEXT: [0,3] .DeeE----R. . . add w3, w4, #1
|
||||
# CHECK-NEXT: [0,4] . DeeE---R. . . add w5, w6, w0
|
||||
# CHECK-NEXT: [0,5] . DeeE---R. . . add w7, w9, w0
|
||||
# CHECK-NEXT: [1,0] . . DeeeeeeeER sdiv w12, w21, w0
|
||||
# CHECK-NEXT: [1,1] . . DeeE-----R add w8, w8, #1
|
||||
# CHECK-NEXT: [1,2] . . DeeE----R add w1, w2, w0
|
||||
# CHECK-NEXT: [1,3] . . DeeE----R add w3, w4, #1
|
||||
# CHECK-NEXT: [1,4] . . DeeE---R add w5, w6, w0
|
||||
# CHECK-NEXT: [1,5] . . DeeE---R add w7, w9, w0
|
||||
|
||||
# CHECK: Average Wait times (based on the timeline view):
|
||||
# CHECK-NEXT: [0]: Executions
|
||||
# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue
|
||||
# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready
|
||||
# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage
|
||||
|
||||
# CHECK: [0] [1] [2] [3]
|
||||
# CHECK-NEXT: 0. 2 0.0 0.0 0.0 sdiv w12, w21, w0
|
||||
# CHECK-NEXT: 1. 2 0.0 0.0 5.0 add w8, w8, #1
|
||||
# CHECK-NEXT: 2. 2 0.0 0.0 4.0 add w1, w2, w0
|
||||
# CHECK-NEXT: 3. 2 0.0 0.0 4.0 add w3, w4, #1
|
||||
# CHECK-NEXT: 4. 2 0.0 0.0 3.0 add w5, w6, w0
|
||||
# CHECK-NEXT: 5. 2 0.0 0.0 3.0 add w7, w9, w0
|
||||
# CHECK-NEXT: 2 0.0 0.0 3.2 <total>
|
129
test/tools/llvm-mca/AArch64/Cortex/A55-out-of-order-retire.s
Normal file
129
test/tools/llvm-mca/AArch64/Cortex/A55-out-of-order-retire.s
Normal file
@ -0,0 +1,129 @@
|
||||
# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
|
||||
# RUN: llvm-mca -mtriple=aarch64 -mcpu=cortex-a55 --all-stats --all-views --iterations=2 < %s | FileCheck %s
|
||||
|
||||
fdiv s1, s2, s3
|
||||
add w8, w8, #1
|
||||
add w1, w2, w0
|
||||
add w3, w4, #1
|
||||
add w5, w6, w0
|
||||
add w7, w9, w0
|
||||
|
||||
# CHECK: Iterations: 2
|
||||
# CHECK-NEXT: Instructions: 12
|
||||
# CHECK-NEXT: Total Cycles: 25
|
||||
# CHECK-NEXT: Total uOps: 12
|
||||
|
||||
# CHECK: Dispatch Width: 2
|
||||
# CHECK-NEXT: uOps Per Cycle: 0.48
|
||||
# CHECK-NEXT: IPC: 0.48
|
||||
# CHECK-NEXT: Block RThroughput: 10.0
|
||||
|
||||
# CHECK: Instruction Info:
|
||||
# CHECK-NEXT: [1]: #uOps
|
||||
# CHECK-NEXT: [2]: Latency
|
||||
# CHECK-NEXT: [3]: RThroughput
|
||||
# CHECK-NEXT: [4]: MayLoad
|
||||
# CHECK-NEXT: [5]: MayStore
|
||||
# CHECK-NEXT: [6]: HasSideEffects (U)
|
||||
|
||||
# CHECK: [1] [2] [3] [4] [5] [6] Instructions:
|
||||
# CHECK-NEXT: 1 13 10.00 fdiv s1, s2, s3
|
||||
# CHECK-NEXT: 1 3 0.50 add w8, w8, #1
|
||||
# CHECK-NEXT: 1 3 0.50 add w1, w2, w0
|
||||
# CHECK-NEXT: 1 3 0.50 add w3, w4, #1
|
||||
# CHECK-NEXT: 1 3 0.50 add w5, w6, w0
|
||||
# CHECK-NEXT: 1 3 0.50 add w7, w9, w0
|
||||
|
||||
# CHECK: Dynamic Dispatch Stall Cycles:
|
||||
# CHECK-NEXT: RAT - Register unavailable: 0
|
||||
# CHECK-NEXT: RCU - Retire tokens unavailable: 0
|
||||
# CHECK-NEXT: SCHEDQ - Scheduler full: 0
|
||||
# CHECK-NEXT: LQ - Load queue full: 0
|
||||
# CHECK-NEXT: SQ - Store queue full: 0
|
||||
# CHECK-NEXT: GROUP - Static restrictions on the dispatch group: 7 (28.0%)
|
||||
|
||||
# CHECK: Dispatch Logic - number of cycles where we saw N micro opcodes dispatched:
|
||||
# CHECK-NEXT: [# dispatched], [# cycles]
|
||||
# CHECK-NEXT: 0, 19 (76.0%)
|
||||
# CHECK-NEXT: 2, 6 (24.0%)
|
||||
|
||||
# CHECK: Schedulers - number of cycles where we saw N micro opcodes issued:
|
||||
# CHECK-NEXT: [# issued], [# cycles]
|
||||
# CHECK-NEXT: 0, 19 (76.0%)
|
||||
# CHECK-NEXT: 2, 6 (24.0%)
|
||||
|
||||
# CHECK: Scheduler's queue usage:
|
||||
# CHECK-NEXT: No scheduler resources used.
|
||||
|
||||
# CHECK: Retire Control Unit - number of cycles where we saw N instructions retired:
|
||||
# CHECK-NEXT: [# retired], [# cycles]
|
||||
# CHECK-NEXT: 0, 18 (72.0%)
|
||||
# CHECK-NEXT: 1, 2 (8.0%)
|
||||
# CHECK-NEXT: 2, 5 (20.0%)
|
||||
|
||||
# CHECK: Total ROB Entries: 64
|
||||
# CHECK-NEXT: Max Used ROB Entries: 7 ( 10.9% )
|
||||
# CHECK-NEXT: Average Used ROB Entries per cy: 2 ( 3.1% )
|
||||
|
||||
# CHECK: Register File statistics:
|
||||
# CHECK-NEXT: Total number of mappings created: 12
|
||||
# CHECK-NEXT: Max number of mappings used: 7
|
||||
|
||||
# CHECK: Resources:
|
||||
# CHECK-NEXT: [0.0] - CortexA55UnitALU
|
||||
# CHECK-NEXT: [0.1] - CortexA55UnitALU
|
||||
# CHECK-NEXT: [1] - CortexA55UnitB
|
||||
# CHECK-NEXT: [2] - CortexA55UnitDiv
|
||||
# CHECK-NEXT: [3.0] - CortexA55UnitFPALU
|
||||
# CHECK-NEXT: [3.1] - CortexA55UnitFPALU
|
||||
# CHECK-NEXT: [4] - CortexA55UnitFPDIV
|
||||
# CHECK-NEXT: [5.0] - CortexA55UnitFPMAC
|
||||
# CHECK-NEXT: [5.1] - CortexA55UnitFPMAC
|
||||
# CHECK-NEXT: [6] - CortexA55UnitLd
|
||||
# CHECK-NEXT: [7] - CortexA55UnitMAC
|
||||
# CHECK-NEXT: [8] - CortexA55UnitSt
|
||||
|
||||
# CHECK: Resource pressure per iteration:
|
||||
# CHECK-NEXT: [0.0] [0.1] [1] [2] [3.0] [3.1] [4] [5.0] [5.1] [6] [7] [8]
|
||||
# CHECK-NEXT: 2.50 2.50 - - - - 10.00 - - - - -
|
||||
|
||||
# CHECK: Resource pressure by instruction:
|
||||
# CHECK-NEXT: [0.0] [0.1] [1] [2] [3.0] [3.1] [4] [5.0] [5.1] [6] [7] [8] Instructions:
|
||||
# CHECK-NEXT: - - - - - - 10.00 - - - - - fdiv s1, s2, s3
|
||||
# CHECK-NEXT: 0.50 0.50 - - - - - - - - - - add w8, w8, #1
|
||||
# CHECK-NEXT: 0.50 0.50 - - - - - - - - - - add w1, w2, w0
|
||||
# CHECK-NEXT: 0.50 0.50 - - - - - - - - - - add w3, w4, #1
|
||||
# CHECK-NEXT: 0.50 0.50 - - - - - - - - - - add w5, w6, w0
|
||||
# CHECK-NEXT: 0.50 0.50 - - - - - - - - - - add w7, w9, w0
|
||||
|
||||
# CHECK: Timeline view:
|
||||
# CHECK-NEXT: 0123456789
|
||||
# CHECK-NEXT: Index 0123456789 01234
|
||||
|
||||
# CHECK: [0,0] DeeeeeeeeeeeeER. . . fdiv s1, s2, s3
|
||||
# CHECK-NEXT: [0,1] DeeER. . . . . add w8, w8, #1
|
||||
# CHECK-NEXT: [0,2] .DeeER . . . . add w1, w2, w0
|
||||
# CHECK-NEXT: [0,3] .DeeER . . . . add w3, w4, #1
|
||||
# CHECK-NEXT: [0,4] . DeeER . . . . add w5, w6, w0
|
||||
# CHECK-NEXT: [0,5] . DeeER . . . . add w7, w9, w0
|
||||
# CHECK-NEXT: [1,0] . . DeeeeeeeeeeeeER fdiv s1, s2, s3
|
||||
# CHECK-NEXT: [1,1] . . DeeER. . . add w8, w8, #1
|
||||
# CHECK-NEXT: [1,2] . . .DeeER . . add w1, w2, w0
|
||||
# CHECK-NEXT: [1,3] . . .DeeER . . add w3, w4, #1
|
||||
# CHECK-NEXT: [1,4] . . . DeeER . . add w5, w6, w0
|
||||
# CHECK-NEXT: [1,5] . . . DeeER . . add w7, w9, w0
|
||||
|
||||
# CHECK: Average Wait times (based on the timeline view):
|
||||
# CHECK-NEXT: [0]: Executions
|
||||
# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue
|
||||
# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready
|
||||
# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage
|
||||
|
||||
# CHECK: [0] [1] [2] [3]
|
||||
# CHECK-NEXT: 0. 2 0.0 0.0 0.0 fdiv s1, s2, s3
|
||||
# CHECK-NEXT: 1. 2 0.0 0.0 0.0 add w8, w8, #1
|
||||
# CHECK-NEXT: 2. 2 0.0 0.0 0.0 add w1, w2, w0
|
||||
# CHECK-NEXT: 3. 2 0.0 0.0 0.0 add w3, w4, #1
|
||||
# CHECK-NEXT: 4. 2 0.0 0.0 0.0 add w5, w6, w0
|
||||
# CHECK-NEXT: 5. 2 0.0 0.0 0.0 add w7, w9, w0
|
||||
# CHECK-NEXT: 2 0.0 0.0 0.0 <total>
|
@ -0,0 +1,8 @@
|
||||
# RUN: llvm-mca -mtriple=aarch64 -mcpu=cortex-a55 --all-views < %s | FileCheck %s
|
||||
# CHECK-NOT: Throughput Bottlenecks
|
||||
|
||||
# RUN: llvm-mca -mtriple=aarch64 -mcpu=cortex-a55 --bottleneck-analysis < %s -o /dev/null 2>&1 | FileCheck %s --check-prefix=CHECK-WARN
|
||||
# CHECK-WARN: warning: bottleneck analysis is not supported for in-order CPU 'cortex-a55'
|
||||
|
||||
add w2, w3, #1
|
||||
|
75
test/tools/llvm-mca/ARM/m7-negative-readadvance.s
Normal file
75
test/tools/llvm-mca/ARM/m7-negative-readadvance.s
Normal file
@ -0,0 +1,75 @@
|
||||
# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
|
||||
# RUN: llvm-mca -mtriple=arm -mcpu=cortex-m7 --timeline --iterations=1 < %s | FileCheck %s
|
||||
|
||||
add r1, r1, #1
|
||||
# ReadAdvance: 0
|
||||
add r1, r1, #2
|
||||
# ReadAdvance: -1
|
||||
vldr d0, [r1]
|
||||
|
||||
# CHECK: Iterations: 1
|
||||
# CHECK-NEXT: Instructions: 3
|
||||
# CHECK-NEXT: Total Cycles: 7
|
||||
# CHECK-NEXT: Total uOps: 3
|
||||
|
||||
# CHECK: Dispatch Width: 2
|
||||
# CHECK-NEXT: uOps Per Cycle: 0.43
|
||||
# CHECK-NEXT: IPC: 0.43
|
||||
# CHECK-NEXT: Block RThroughput: 1.5
|
||||
|
||||
# CHECK: Instruction Info:
|
||||
# CHECK-NEXT: [1]: #uOps
|
||||
# CHECK-NEXT: [2]: Latency
|
||||
# CHECK-NEXT: [3]: RThroughput
|
||||
# CHECK-NEXT: [4]: MayLoad
|
||||
# CHECK-NEXT: [5]: MayStore
|
||||
# CHECK-NEXT: [6]: HasSideEffects (U)
|
||||
|
||||
# CHECK: [1] [2] [3] [4] [5] [6] Instructions:
|
||||
# CHECK-NEXT: 1 1 0.50 add.w r1, r1, #1
|
||||
# CHECK-NEXT: 1 1 0.50 add.w r1, r1, #2
|
||||
# CHECK-NEXT: 1 3 1.00 * vldr d0, [r1]
|
||||
|
||||
# CHECK: Resources:
|
||||
# CHECK-NEXT: [0.0] - M7UnitALU
|
||||
# CHECK-NEXT: [0.1] - M7UnitALU
|
||||
# CHECK-NEXT: [1] - M7UnitBranch
|
||||
# CHECK-NEXT: [2.0] - M7UnitLoad
|
||||
# CHECK-NEXT: [2.1] - M7UnitLoad
|
||||
# CHECK-NEXT: [3] - M7UnitMAC
|
||||
# CHECK-NEXT: [4] - M7UnitSIMD
|
||||
# CHECK-NEXT: [5] - M7UnitShift1
|
||||
# CHECK-NEXT: [6] - M7UnitShift2
|
||||
# CHECK-NEXT: [7] - M7UnitStore
|
||||
# CHECK-NEXT: [8] - M7UnitVFP
|
||||
# CHECK-NEXT: [9.0] - M7UnitVPort
|
||||
# CHECK-NEXT: [9.1] - M7UnitVPort
|
||||
|
||||
# CHECK: Resource pressure per iteration:
|
||||
# CHECK-NEXT: [0.0] [0.1] [1] [2.0] [2.1] [3] [4] [5] [6] [7] [8] [9.0] [9.1]
|
||||
# CHECK-NEXT: 1.00 1.00 - - 1.00 - - - - - - - 2.00
|
||||
|
||||
# CHECK: Resource pressure by instruction:
|
||||
# CHECK-NEXT: [0.0] [0.1] [1] [2.0] [2.1] [3] [4] [5] [6] [7] [8] [9.0] [9.1] Instructions:
|
||||
# CHECK-NEXT: - 1.00 - - - - - - - - - - - add.w r1, r1, #1
|
||||
# CHECK-NEXT: 1.00 - - - - - - - - - - - - add.w r1, r1, #2
|
||||
# CHECK-NEXT: - - - - 1.00 - - - - - - - 2.00 vldr d0, [r1]
|
||||
|
||||
# CHECK: Timeline view:
|
||||
# CHECK-NEXT: Index 0123456
|
||||
|
||||
# CHECK: [0,0] DER .. add.w r1, r1, #1
|
||||
# CHECK-NEXT: [0,1] .DER .. add.w r1, r1, #2
|
||||
# CHECK-NEXT: [0,2] . DeER vldr d0, [r1]
|
||||
|
||||
# CHECK: Average Wait times (based on the timeline view):
|
||||
# CHECK-NEXT: [0]: Executions
|
||||
# CHECK-NEXT: [1]: Average time spent waiting in a scheduler's queue
|
||||
# CHECK-NEXT: [2]: Average time spent waiting in a scheduler's queue while ready
|
||||
# CHECK-NEXT: [3]: Average time elapsed from WB until retire stage
|
||||
|
||||
# CHECK: [0] [1] [2] [3]
|
||||
# CHECK-NEXT: 0. 1 0.0 0.0 0.0 add.w r1, r1, #1
|
||||
# CHECK-NEXT: 1. 1 0.0 0.0 0.0 add.w r1, r1, #2
|
||||
# CHECK-NEXT: 2. 1 0.0 0.0 0.0 vldr d0, [r1]
|
||||
# CHECK-NEXT: 1 0.0 0.0 0.0 <total>
|
@ -1,3 +1,3 @@
|
||||
# RUN: not llvm-mca %s -mtriple=x86_64-unknown-unknown -mcpu=atom -o /dev/null 2>&1 | FileCheck %s
|
||||
|
||||
# CHECK: error: please specify an out-of-order cpu. 'atom' is an in-order cpu.
|
||||
# RUN: llvm-mca %s -mtriple=x86_64-unknown-unknown -mcpu=atom -o /dev/null 2>&1 | FileCheck %s
|
||||
# CHECK: warning: support for in-order CPU 'atom' is experimental.
|
||||
movsbw %al, %di
|
||||
|
@ -257,14 +257,15 @@ static void processOptionImpl(cl::opt<bool> &O, const cl::opt<bool> &Default) {
|
||||
O = Default.getValue();
|
||||
}
|
||||
|
||||
static void processViewOptions() {
|
||||
static void processViewOptions(bool IsOutOfOrder) {
|
||||
if (!EnableAllViews.getNumOccurrences() &&
|
||||
!EnableAllStats.getNumOccurrences())
|
||||
return;
|
||||
|
||||
if (EnableAllViews.getNumOccurrences()) {
|
||||
processOptionImpl(PrintSummaryView, EnableAllViews);
|
||||
processOptionImpl(EnableBottleneckAnalysis, EnableAllViews);
|
||||
if (IsOutOfOrder)
|
||||
processOptionImpl(EnableBottleneckAnalysis, EnableAllViews);
|
||||
processOptionImpl(PrintResourcePressureView, EnableAllViews);
|
||||
processOptionImpl(PrintTimelineView, EnableAllViews);
|
||||
processOptionImpl(PrintInstructionInfoView, EnableAllViews);
|
||||
@ -327,9 +328,6 @@ int main(int argc, char **argv) {
|
||||
return 1;
|
||||
}
|
||||
|
||||
// Apply overrides to llvm-mca specific options.
|
||||
processViewOptions();
|
||||
|
||||
if (MCPU == "native")
|
||||
MCPU = std::string(llvm::sys::getHostCPUName());
|
||||
|
||||
@ -339,10 +337,10 @@ int main(int argc, char **argv) {
|
||||
if (!STI->isCPUStringValid(MCPU))
|
||||
return 1;
|
||||
|
||||
if (!PrintInstructionTables && !STI->getSchedModel().isOutOfOrder()) {
|
||||
WithColor::error() << "please specify an out-of-order cpu. '" << MCPU
|
||||
<< "' is an in-order cpu.\n";
|
||||
return 1;
|
||||
bool IsOutOfOrder = STI->getSchedModel().isOutOfOrder();
|
||||
if (!PrintInstructionTables && !IsOutOfOrder) {
|
||||
WithColor::warning() << "support for in-order CPU '" << MCPU
|
||||
<< "' is experimental.\n";
|
||||
}
|
||||
|
||||
if (!STI->getSchedModel().hasInstrSchedModel()) {
|
||||
@ -358,6 +356,9 @@ int main(int argc, char **argv) {
|
||||
return 1;
|
||||
}
|
||||
|
||||
// Apply overrides to llvm-mca specific options.
|
||||
processViewOptions(IsOutOfOrder);
|
||||
|
||||
std::unique_ptr<MCRegisterInfo> MRI(TheTarget->createMCRegInfo(TripleName));
|
||||
assert(MRI && "Unable to create target register info!");
|
||||
|
||||
@ -539,6 +540,11 @@ int main(int argc, char **argv) {
|
||||
std::make_unique<mca::SummaryView>(SM, Insts, DispatchWidth));
|
||||
|
||||
if (EnableBottleneckAnalysis) {
|
||||
if (!IsOutOfOrder) {
|
||||
WithColor::warning()
|
||||
<< "bottleneck analysis is not supported for in-order CPU '" << MCPU
|
||||
<< "'.\n";
|
||||
}
|
||||
Printer.addView(std::make_unique<mca::BottleneckAnalysis>(
|
||||
*STI, *IP, Insts, S.getNumIterations()));
|
||||
}
|
||||
|
@ -993,6 +993,7 @@ void SubtargetEmitter::GenSchedClassTables(const CodeGenProcModel &ProcModel,
|
||||
SCDesc.NumMicroOps = 0;
|
||||
SCDesc.BeginGroup = false;
|
||||
SCDesc.EndGroup = false;
|
||||
SCDesc.RetireOOO = false;
|
||||
SCDesc.WriteProcResIdx = 0;
|
||||
SCDesc.WriteLatencyIdx = 0;
|
||||
SCDesc.ReadAdvanceIdx = 0;
|
||||
@ -1095,6 +1096,7 @@ void SubtargetEmitter::GenSchedClassTables(const CodeGenProcModel &ProcModel,
|
||||
SCDesc.EndGroup |= WriteRes->getValueAsBit("EndGroup");
|
||||
SCDesc.BeginGroup |= WriteRes->getValueAsBit("SingleIssue");
|
||||
SCDesc.EndGroup |= WriteRes->getValueAsBit("SingleIssue");
|
||||
SCDesc.RetireOOO |= WriteRes->getValueAsBit("RetireOOO");
|
||||
|
||||
// Create an entry for each ProcResource listed in WriteRes.
|
||||
RecVec PRVec = WriteRes->getValueAsListOfDefs("ProcResources");
|
||||
@ -1293,7 +1295,7 @@ void SubtargetEmitter::EmitSchedClassTables(SchedClassTables &SchedTables,
|
||||
std::vector<MCSchedClassDesc> &SCTab =
|
||||
SchedTables.ProcSchedClasses[1 + (PI - SchedModels.procModelBegin())];
|
||||
|
||||
OS << "\n// {Name, NumMicroOps, BeginGroup, EndGroup,"
|
||||
OS << "\n// {Name, NumMicroOps, BeginGroup, EndGroup, RetireOOO,"
|
||||
<< " WriteProcResIdx,#, WriteLatencyIdx,#, ReadAdvanceIdx,#}\n";
|
||||
OS << "static const llvm::MCSchedClassDesc "
|
||||
<< PI->ModelName << "SchedClasses[] = {\n";
|
||||
@ -1304,7 +1306,7 @@ void SubtargetEmitter::EmitSchedClassTables(SchedClassTables &SchedTables,
|
||||
&& "invalid class not first");
|
||||
OS << " {DBGFIELD(\"InvalidSchedClass\") "
|
||||
<< MCSchedClassDesc::InvalidNumMicroOps
|
||||
<< ", false, false, 0, 0, 0, 0, 0, 0},\n";
|
||||
<< ", false, false, false, 0, 0, 0, 0, 0, 0},\n";
|
||||
|
||||
for (unsigned SCIdx = 1, SCEnd = SCTab.size(); SCIdx != SCEnd; ++SCIdx) {
|
||||
MCSchedClassDesc &MCDesc = SCTab[SCIdx];
|
||||
@ -1315,6 +1317,7 @@ void SubtargetEmitter::EmitSchedClassTables(SchedClassTables &SchedTables,
|
||||
OS << MCDesc.NumMicroOps
|
||||
<< ", " << ( MCDesc.BeginGroup ? "true" : "false" )
|
||||
<< ", " << ( MCDesc.EndGroup ? "true" : "false" )
|
||||
<< ", " << ( MCDesc.RetireOOO ? "true" : "false" )
|
||||
<< ", " << format("%2d", MCDesc.WriteProcResIdx)
|
||||
<< ", " << MCDesc.NumWriteProcResEntries
|
||||
<< ", " << format("%2d", MCDesc.WriteLatencyIdx)
|
||||
|
Loading…
Reference in New Issue
Block a user