2018-07-06 20:03:14 +02:00
|
|
|
//===---------------------------- Context.cpp -------------------*- C++ -*-===//
|
|
|
|
//
|
2019-01-19 09:50:56 +01:00
|
|
|
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
|
|
|
|
// See https://llvm.org/LICENSE.txt for license information.
|
|
|
|
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
|
2018-07-06 20:03:14 +02:00
|
|
|
//
|
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
/// \file
|
|
|
|
///
|
|
|
|
/// This file defines a class for holding ownership of various simulated
|
|
|
|
/// hardware units. A Context also provides a utility routine for constructing
|
|
|
|
/// a default out-of-order pipeline with fetch, dispatch, execute, and retire
|
2018-08-22 20:03:58 +02:00
|
|
|
/// stages.
|
2018-07-06 20:03:14 +02:00
|
|
|
///
|
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
|
2018-12-17 09:08:31 +01:00
|
|
|
#include "llvm/MCA/Context.h"
|
|
|
|
#include "llvm/MCA/HardwareUnits/RegisterFile.h"
|
|
|
|
#include "llvm/MCA/HardwareUnits/RetireControlUnit.h"
|
|
|
|
#include "llvm/MCA/HardwareUnits/Scheduler.h"
|
|
|
|
#include "llvm/MCA/Stages/DispatchStage.h"
|
|
|
|
#include "llvm/MCA/Stages/EntryStage.h"
|
|
|
|
#include "llvm/MCA/Stages/ExecuteStage.h"
|
2019-03-29 13:15:37 +01:00
|
|
|
#include "llvm/MCA/Stages/MicroOpQueueStage.h"
|
2018-12-17 09:08:31 +01:00
|
|
|
#include "llvm/MCA/Stages/RetireStage.h"
|
2018-07-06 20:03:14 +02:00
|
|
|
|
2018-10-30 16:56:08 +01:00
|
|
|
namespace llvm {
|
2018-07-06 20:03:14 +02:00
|
|
|
namespace mca {
|
|
|
|
|
|
|
|
std::unique_ptr<Pipeline>
|
2019-08-08 12:30:58 +02:00
|
|
|
Context::createDefaultPipeline(const PipelineOptions &Opts, SourceMgr &SrcMgr) {
|
2018-07-06 20:03:14 +02:00
|
|
|
const MCSchedModel &SM = STI.getSchedModel();
|
|
|
|
|
|
|
|
// Create the hardware units defining the backend.
|
2019-08-15 17:54:37 +02:00
|
|
|
auto RCU = std::make_unique<RetireControlUnit>(SM);
|
|
|
|
auto PRF = std::make_unique<RegisterFile>(SM, MRI, Opts.RegisterFileSize);
|
|
|
|
auto LSU = std::make_unique<LSUnit>(SM, Opts.LoadQueueSize,
|
[llvm-mca][MC] Add the ability to declare which processor resources model load/store queues (PR36666).
This patch adds the ability to specify via tablegen which processor resources
are load/store queue resources.
A new tablegen class named MemoryQueue can be optionally used to mark resources
that model load/store queues. Information about the load/store queue is
collected at 'CodeGenSchedule' stage, and analyzed by the 'SubtargetEmitter' to
initialize two new fields in struct MCExtraProcessorInfo named `LoadQueueID` and
`StoreQueueID`. Those two fields are identifiers for buffered resources used to
describe the load queue and the store queue.
Field `BufferSize` is interpreted as the number of entries in the queue, while
the number of units is a throughput indicator (i.e. number of available pickers
for loads/stores).
At construction time, LSUnit in llvm-mca checks for the presence of extra
processor information (i.e. MCExtraProcessorInfo) in the scheduling model. If
that information is available, and fields LoadQueueID and StoreQueueID are set
to a value different than zero (i.e. the invalid processor resource index), then
LSUnit initializes its LoadQueue/StoreQueue based on the BufferSize value
declared by the two processor resources.
With this patch, we more accurately track dynamic dispatch stalls caused by the
lack of LS tokens (i.e. load/store queue full). This is also shown by the
differences in two BdVer2 tests. Stalls that were previously classified as
generic SCHEDULER FULL stalls, are not correctly classified either as "load
queue full" or "store queue full".
About the differences in the -scheduler-stats view: those differences are
expected, because entries in the load/store queue are not released at
instruction issue stage. Instead, those are released at instruction executed
stage. This is the main reason why for the modified tests, the load/store
queues gets full before PdEx is full.
Differential Revision: https://reviews.llvm.org/D54957
llvm-svn: 347857
2018-11-29 13:15:56 +01:00
|
|
|
Opts.StoreQueueSize, Opts.AssumeNoAlias);
|
2019-08-15 17:54:37 +02:00
|
|
|
auto HWS = std::make_unique<Scheduler>(SM, *LSU);
|
2018-07-06 20:03:14 +02:00
|
|
|
|
2018-08-29 02:34:32 +02:00
|
|
|
// Create the pipeline stages.
|
2019-08-15 17:54:37 +02:00
|
|
|
auto Fetch = std::make_unique<EntryStage>(SrcMgr);
|
|
|
|
auto Dispatch = std::make_unique<DispatchStage>(STI, MRI, Opts.DispatchWidth,
|
2018-08-29 02:41:04 +02:00
|
|
|
*RCU, *PRF);
|
[MCA] Highlight kernel bottlenecks in the summary view.
This patch adds a new flag named -bottleneck-analysis to print out information
about throughput bottlenecks.
MCA knows how to identify and classify dynamic dispatch stalls. However, it
doesn't know how to analyze and highlight kernel bottlenecks. The goal of this
patch is to teach MCA how to correlate increases in backend pressure to backend
stalls (and therefore, the loss of throughput).
From a Scheduler point of view, backend pressure is a function of the scheduler
buffer usage (i.e. how the number of uOps in the scheduler buffers changes over
time). Backend pressure increases (or decreases) when there is a mismatch
between the number of opcodes dispatched, and the number of opcodes issued in
the same cycle. Since buffer resources are limited, continuous increases in
backend pressure would eventually leads to dispatch stalls. So, there is a
strong correlation between dispatch stalls, and how backpressure changed over
time.
This patch teaches how to identify situations where backend pressure increases
due to:
- unavailable pipeline resources.
- data dependencies.
Data dependencies may delay execution of instructions and therefore increase the
time that uOps have to spend in the scheduler buffers. That often translates to
an increase in backend pressure which may eventually lead to a bottleneck.
Contention on pipeline resources may also delay execution of instructions, and
lead to a temporary increase in backend pressure.
Internally, the Scheduler classifies instructions based on whether register /
memory operands are available or not.
An instruction is marked as "ready to execute" only if data dependencies are
fully resolved.
Every cycle, the Scheduler attempts to execute all instructions that are ready
to execute. If an instruction cannot execute because of unavailable pipeline
resources, then the Scheduler internally updates a BusyResourceUnits mask with
the ID of each unavailable resource.
ExecuteStage is responsible for tracking changes in backend pressure. If backend
pressure increases during a cycle because of contention on pipeline resources,
then ExecuteStage sends a "backend pressure" event to the listeners.
That event would contain information about instructions delayed by resource
pressure, as well as the BusyResourceUnits mask.
Note that ExecuteStage also knows how to identify situations where backpressure
increased because of delays introduced by data dependencies.
The SummaryView observes "backend pressure" events and prints out a "bottleneck
report".
Example of bottleneck report:
```
Cycles with backend pressure increase [ 99.89% ]
Throughput Bottlenecks:
Resource Pressure [ 0.00% ]
Data Dependencies: [ 99.89% ]
- Register Dependencies [ 0.00% ]
- Memory Dependencies [ 99.89% ]
```
A bottleneck report is printed out only if increases in backend pressure
eventually caused backend stalls.
About the time complexity:
Time complexity is linear in the number of instructions in the
Scheduler::PendingSet.
The average slowdown tends to be in the range of ~5-6%.
For memory intensive kernels, the slowdown can be significant if flag
-noalias=false is specified. In the worst case scenario I have observed a
slowdown of ~30% when flag -noalias=false was specified.
We can definitely recover part of that slowdown if we optimize class LSUnit (by
doing extra bookkeeping to speedup queries). For now, this new analysis is
disabled by default, and it can be enabled via flag -bottleneck-analysis. Users
of MCA as a library can enable the generation of pressure events through the
constructor of ExecuteStage.
This patch partially addresses https://bugs.llvm.org/show_bug.cgi?id=37494
Differential Revision: https://reviews.llvm.org/D58728
llvm-svn: 355308
2019-03-04 12:52:34 +01:00
|
|
|
auto Execute =
|
2019-08-15 17:54:37 +02:00
|
|
|
std::make_unique<ExecuteStage>(*HWS, Opts.EnableBottleneckAnalysis);
|
2019-10-08 12:46:01 +02:00
|
|
|
auto Retire = std::make_unique<RetireStage>(*RCU, *PRF, *LSU);
|
2018-07-06 20:03:14 +02:00
|
|
|
|
2018-08-20 16:41:36 +02:00
|
|
|
// Pass the ownership of all the hardware units to this Context.
|
2018-07-06 20:03:14 +02:00
|
|
|
addHardwareUnit(std::move(RCU));
|
|
|
|
addHardwareUnit(std::move(PRF));
|
2018-08-20 16:41:36 +02:00
|
|
|
addHardwareUnit(std::move(LSU));
|
2018-07-06 20:03:14 +02:00
|
|
|
addHardwareUnit(std::move(HWS));
|
|
|
|
|
|
|
|
// Build the pipeline.
|
2019-08-15 17:54:37 +02:00
|
|
|
auto StagePipeline = std::make_unique<Pipeline>();
|
[llvm-mca] Refactor how execution is orchestrated by the Pipeline.
This patch changes how instruction execution is orchestrated by the Pipeline.
In particular, this patch makes it more explicit how instructions transition
through the various pipeline stages during execution.
The main goal is to simplify both the stage API and the Pipeline execution. At
the same time, this patch fixes some design issues which are currently latent,
but that are likely to cause problems in future if people start defining custom
pipelines.
The new design assumes that each pipeline stage knows the "next-in-sequence".
The Stage API has gained three new methods:
- isAvailable(IR)
- checkNextStage(IR)
- moveToTheNextStage(IR).
An instruction IR can be executed by a Stage if method `Stage::isAvailable(IR)`
returns true.
Instructions can move to next stages using method moveToTheNextStage(IR).
An instruction cannot be moved to the next stage if method checkNextStage(IR)
(called on the current stage) returns false.
Stages are now responsible for moving instructions to the next stage in sequence
if necessary.
Instructions are allowed to transition through multiple stages during a single
cycle (as long as stages are available, and as long as all the calls to
`checkNextStage(IR)` returns true).
Methods `Stage::preExecute()` and `Stage::postExecute()` have now become
redundant, and those are removed by this patch.
Method Pipeline::runCycle() is now simpler, and it correctly visits stages
on every begin/end of cycle.
Other changes:
- DispatchStage no longer requires a reference to the Scheduler.
- ExecuteStage no longer needs to directly interact with the
RetireControlUnit. Instead, executed instructions are now directly moved to the
next stage (i.e. the retire stage).
- RetireStage gained an execute method. This allowed us to remove the
dependency with the RCU in ExecuteStage.
- FecthStage now updates the "program counter" during cycleBegin() (i.e.
before we start executing new instructions).
- We no longer need Stage::Status to be returned by method execute(). It has
been dropped in favor of a more lightweight llvm::Error.
Overally, I measured a ~11% performance gain w.r.t. the previous design. I also
think that the Stage interface is probably easier to read now. That being said,
code comments have to be improved, and I plan to do it in a follow-up patch.
Differential revision: https://reviews.llvm.org/D50849
llvm-svn: 339923
2018-08-16 21:00:48 +02:00
|
|
|
StagePipeline->appendStage(std::move(Fetch));
|
2019-03-29 13:15:37 +01:00
|
|
|
if (Opts.MicroOpQueueSize)
|
2019-08-15 17:54:37 +02:00
|
|
|
StagePipeline->appendStage(std::make_unique<MicroOpQueueStage>(
|
2019-03-29 13:15:37 +01:00
|
|
|
Opts.MicroOpQueueSize, Opts.DecodersThroughput));
|
[llvm-mca] Refactor how execution is orchestrated by the Pipeline.
This patch changes how instruction execution is orchestrated by the Pipeline.
In particular, this patch makes it more explicit how instructions transition
through the various pipeline stages during execution.
The main goal is to simplify both the stage API and the Pipeline execution. At
the same time, this patch fixes some design issues which are currently latent,
but that are likely to cause problems in future if people start defining custom
pipelines.
The new design assumes that each pipeline stage knows the "next-in-sequence".
The Stage API has gained three new methods:
- isAvailable(IR)
- checkNextStage(IR)
- moveToTheNextStage(IR).
An instruction IR can be executed by a Stage if method `Stage::isAvailable(IR)`
returns true.
Instructions can move to next stages using method moveToTheNextStage(IR).
An instruction cannot be moved to the next stage if method checkNextStage(IR)
(called on the current stage) returns false.
Stages are now responsible for moving instructions to the next stage in sequence
if necessary.
Instructions are allowed to transition through multiple stages during a single
cycle (as long as stages are available, and as long as all the calls to
`checkNextStage(IR)` returns true).
Methods `Stage::preExecute()` and `Stage::postExecute()` have now become
redundant, and those are removed by this patch.
Method Pipeline::runCycle() is now simpler, and it correctly visits stages
on every begin/end of cycle.
Other changes:
- DispatchStage no longer requires a reference to the Scheduler.
- ExecuteStage no longer needs to directly interact with the
RetireControlUnit. Instead, executed instructions are now directly moved to the
next stage (i.e. the retire stage).
- RetireStage gained an execute method. This allowed us to remove the
dependency with the RCU in ExecuteStage.
- FecthStage now updates the "program counter" during cycleBegin() (i.e.
before we start executing new instructions).
- We no longer need Stage::Status to be returned by method execute(). It has
been dropped in favor of a more lightweight llvm::Error.
Overally, I measured a ~11% performance gain w.r.t. the previous design. I also
think that the Stage interface is probably easier to read now. That being said,
code comments have to be improved, and I plan to do it in a follow-up patch.
Differential revision: https://reviews.llvm.org/D50849
llvm-svn: 339923
2018-08-16 21:00:48 +02:00
|
|
|
StagePipeline->appendStage(std::move(Dispatch));
|
|
|
|
StagePipeline->appendStage(std::move(Execute));
|
|
|
|
StagePipeline->appendStage(std::move(Retire));
|
|
|
|
return StagePipeline;
|
2018-07-06 20:03:14 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
} // namespace mca
|
2018-10-30 16:56:08 +01:00
|
|
|
} // namespace llvm
|