1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-24 03:33:20 +01:00
Commit Graph

85 Commits

Author SHA1 Message Date
Andrea Di Biagio
b5b9678e73 [MCA] Moved the bottleneck analysis to its own file. NFCI
llvm-svn: 358554
2019-04-17 06:02:05 +00:00
Andrea Di Biagio
873c7239ab [MCA] Add an experimental MicroOpQueue stage.
This patch adds an experimental stage named MicroOpQueueStage.
MicroOpQueueStage can be used to simulate a hardware micro-op queue (basically,
a decoupling queue between 'decode' and 'dispatch').  Users can specify a queue
size, as well as a optional MaxIPC (which - in the absence of a "Decoders" stage
- can be used to simulate a different throughput from the decoders).

This stage is added to the default pipeline between the EntryStage and the
DispatchStage only if PipelineOption::MicroOpQueue is different than zero. By
default, llvm-mca sets PipelineOption::MicroOpQueue to the value of hidden flag
-micro-op-queue-size.

Throughput from the decoder can be simulated via another hidden flag named
-decoder-throughput.  That flag allows us to quickly experiment with different
frontend throughputs.  For targets that declare a loop buffer, flag
-decoder-throughput allows users to do multiple runs, each time simulating a
different throughput from the decoders.

This stage can/will be extended in future. For example, we could add a "buffer
full" event to notify bottlenecks caused by backpressure. flag
-decoder-throughput would probably go away if in future we delegate to another
stage (DecoderStage?) the simulation of a (potentially variable) throughput from
the decoders. For now, flag -decoder-throughput is "good enough" to run some
simple experiments.

Differential Revision: https://reviews.llvm.org/D59928

llvm-svn: 357248
2019-03-29 12:15:37 +00:00
Andrea Di Biagio
a1cb49539d [MCA] Correctly update the UsedResourceGroups mask in the InstrBuilder.
Found by inspection when looking at the debug output of MCA.
This problem was latent, and none of the upstream models were affected by it.
No functional change intended.

llvm-svn: 357000
2019-03-26 15:38:37 +00:00
Matt Davis
28382105e7 [llvm-mca] Emit a message when no bottlenecks are identified.
Summary:
Since bottleneck hints are enabled via user request, it can be
confusing if no bottleneck information is presented.  Such is the
case when no bottlenecks are identified.  This patch emits a message
in that case.

Reviewers: andreadb

Reviewed By: andreadb

Subscribers: tschuett, gbedwell, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59098

llvm-svn: 355628
2019-03-07 19:34:44 +00:00
Andrea Di Biagio
c5a150eca8 [MCA] Highlight kernel bottlenecks in the summary view.
This patch adds a new flag named -bottleneck-analysis to print out information
about throughput bottlenecks.

MCA knows how to identify and classify dynamic dispatch stalls. However, it
doesn't know how to analyze and highlight kernel bottlenecks.  The goal of this
patch is to teach MCA how to correlate increases in backend pressure to backend
stalls (and therefore, the loss of throughput).

From a Scheduler point of view, backend pressure is a function of the scheduler
buffer usage (i.e. how the number of uOps in the scheduler buffers changes over
time). Backend pressure increases (or decreases) when there is a mismatch
between the number of opcodes dispatched, and the number of opcodes issued in
the same cycle.  Since buffer resources are limited, continuous increases in
backend pressure would eventually leads to dispatch stalls. So, there is a
strong correlation between dispatch stalls, and how backpressure changed over
time.

This patch teaches how to identify situations where backend pressure increases
due to:
 - unavailable pipeline resources.
 - data dependencies.

Data dependencies may delay execution of instructions and therefore increase the
time that uOps have to spend in the scheduler buffers. That often translates to
an increase in backend pressure which may eventually lead to a bottleneck.
Contention on pipeline resources may also delay execution of instructions, and
lead to a temporary increase in backend pressure.

Internally, the Scheduler classifies instructions based on whether register /
memory operands are available or not.

An instruction is marked as "ready to execute" only if data dependencies are
fully resolved.
Every cycle, the Scheduler attempts to execute all instructions that are ready
to execute. If an instruction cannot execute because of unavailable pipeline
resources, then the Scheduler internally updates a BusyResourceUnits mask with
the ID of each unavailable resource.

ExecuteStage is responsible for tracking changes in backend pressure. If backend
pressure increases during a cycle because of contention on pipeline resources,
then ExecuteStage sends a "backend pressure" event to the listeners.
That event would contain information about instructions delayed by resource
pressure, as well as the BusyResourceUnits mask.

Note that ExecuteStage also knows how to identify situations where backpressure
increased because of delays introduced by data dependencies.

The SummaryView observes "backend pressure" events and prints out a "bottleneck
report".

Example of bottleneck report:

```
Cycles with backend pressure increase [ 99.89% ]
Throughput Bottlenecks:
  Resource Pressure       [ 0.00% ]
  Data Dependencies:      [ 99.89% ]
   - Register Dependencies [ 0.00% ]
   - Memory Dependencies   [ 99.89% ]
```

A bottleneck report is printed out only if increases in backend pressure
eventually caused backend stalls.

About the time complexity:

Time complexity is linear in the number of instructions in the
Scheduler::PendingSet.

The average slowdown tends to be in the range of ~5-6%.
For memory intensive kernels, the slowdown can be significant if flag
-noalias=false is specified. In the worst case scenario I have observed a
slowdown of ~30% when flag -noalias=false was specified.

We can definitely recover part of that slowdown if we optimize class LSUnit (by
doing extra bookkeeping to speedup queries). For now, this new analysis is
disabled by default, and it can be enabled via flag -bottleneck-analysis. Users
of MCA as a library can enable the generation of pressure events through the
constructor of ExecuteStage.

This patch partially addresses https://bugs.llvm.org/show_bug.cgi?id=37494

Differential Revision: https://reviews.llvm.org/D58728

llvm-svn: 355308
2019-03-04 11:52:34 +00:00
Chandler Carruth
ae65e281f3 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Matt Davis
8c2021a240 [llvm-mca] Rename an error variable.
llvm-svn: 349662
2018-12-19 18:57:43 +00:00
Matt Davis
413bf25b3d [llvm-mca] Add an error handler for error from parseCodeRegions
Summary:
It's a bit tricky to add a test for the failing path right now, binary support will have an easier path to exercise the path here.

* Ran clang-format.



Reviewers: andreadb

Reviewed By: andreadb

Subscribers: tschuett, gbedwell, llvm-commits

Differential Revision: https://reviews.llvm.org/D55803

llvm-svn: 349659
2018-12-19 18:27:05 +00:00
Andrea Di Biagio
e6ecfd3ed6 [MCA] Don't assume that createMCInstrAnalysis() always returns a valid pointer.
Class InstrBuilder wrongly assumed that llvm targets were always able to return
a non-null pointer when createMCInstrAnalysis() was called on them.
This was causing crashes when simulating executions for targets that don't
provide an MCInstrAnalysis object.
This patch fixes the issue by making MCInstrAnalysis optional.

llvm-svn: 349352
2018-12-17 14:00:37 +00:00
Clement Courbet
9093bbf39e [llvm-mca] Move llvm-mca library to llvm/lib/MCA.
Summary: See PR38731.

Reviewers: andreadb

Subscribers: mgorny, javed.absar, tschuett, gbedwell, andreadb, RKSimon, llvm-commits

Differential Revision: https://reviews.llvm.org/D55557

llvm-svn: 349332
2018-12-17 08:08:31 +00:00
Andrea Di Biagio
a900121de4 [llvm-mca][MC] Add the ability to declare which processor resources model load/store queues (PR36666).
This patch adds the ability to specify via tablegen which processor resources
are load/store queue resources.

A new tablegen class named MemoryQueue can be optionally used to mark resources
that model load/store queues.  Information about the load/store queue is
collected at 'CodeGenSchedule' stage, and analyzed by the 'SubtargetEmitter' to
initialize two new fields in struct MCExtraProcessorInfo named `LoadQueueID` and
`StoreQueueID`.  Those two fields are identifiers for buffered resources used to
describe the load queue and the store queue.
Field `BufferSize` is interpreted as the number of entries in the queue, while
the number of units is a throughput indicator (i.e. number of available pickers
for loads/stores).

At construction time, LSUnit in llvm-mca checks for the presence of extra
processor information (i.e. MCExtraProcessorInfo) in the scheduling model.  If
that information is available, and fields LoadQueueID and StoreQueueID are set
to a value different than zero (i.e. the invalid processor resource index), then
LSUnit initializes its LoadQueue/StoreQueue based on the BufferSize value
declared by the two processor resources.

With this patch, we more accurately track dynamic dispatch stalls caused by the
lack of LS tokens (i.e. load/store queue full). This is also shown by the
differences in two BdVer2 tests. Stalls that were previously classified as
generic SCHEDULER FULL stalls, are not correctly classified either as "load
queue full" or "store queue full".

About the differences in the -scheduler-stats view: those differences are
expected, because entries in the load/store queue are not released at
instruction issue stage. Instead, those are released at instruction executed
stage.  This is the main reason why for the modified tests, the load/store
queues gets full before PdEx is full.

Differential Revision: https://reviews.llvm.org/D54957

llvm-svn: 347857
2018-11-29 12:15:56 +00:00
Andrea Di Biagio
dadc75adba Reapply "[llvm-mca] Return the total number of cycles from method Pipeline::run()."
This reapplies r347767 (originally reviewed at: https://reviews.llvm.org/D55000)
with a fix for the missing std::move of the Error returned by the call to
Pipeline::runCycle().

Below is the original commit message from r347767.

If a user only cares about the overall latency, then the best/quickest way is to
change method Pipeline::run() so that it returns the total number of cycles to
the caller.

When the simulation pipeline is run, the number of cycles (or an error) is
returned from method Pipeline::run().
The advantage is that no hardware event listener is needed for computing that
latency. So, the whole process should be faster (and simpler - at least for that
particular use case).

llvm-svn: 347795
2018-11-28 19:31:19 +00:00
Andrea Di Biagio
8a3d22a34f Revert [llvm-mca] Return the total number of cycles from method Pipeline::run().
This reverts commits 347767.

llvm-svn: 347775
2018-11-28 16:39:48 +00:00
Andrea Di Biagio
8c9f6063a8 [llvm-mca] Return the total number of cycles from method Pipeline::run().
If a user only cares about the overall latency, then the best/quickest way is to
change method Pipeline::run() so that it returns the total number of cycles to
the caller.

When the simulation pipeline is run, the number of cycles (or an error) is
returned from method Pipeline::run().
The advantage is that no hardware event listener is needed for computing that
latency. So, the whole process should be faster (and simpler - at least for that
particular use case).

llvm-svn: 347767
2018-11-28 16:24:51 +00:00
Andrea Di Biagio
57fcc40fb8 [llvm-mca][View] Improved Retire Control Unit Statistics.
RetireControlUnitStatistics now reports extra information about the ROB and the
avg/maximum number of entries consumed over the entire simulation.

Example:
  Retire Control Unit - number of cycles where we saw N instructions retired:
  [# retired], [# cycles]
   0,           109  (17.9%)
   1,           102  (16.7%)
   2,           399  (65.4%)

  Total ROB Entries:                64
  Max Used ROB Entries:             35  ( 54.7% )
  Average Used ROB Entries per cy:  32  ( 50.0% )

Documentation in llvm/docs/CommandGuide/llvmn-mca.rst has been updated to
reflect this change.

llvm-svn: 347493
2018-11-23 12:12:57 +00:00
Matt Davis
c7fa72f403 [llvm-mca] Partially revert r346417.
Restored the llvm:: namespace qualifier on make_unique.
This removes the ambiguity with make_unique.  

llvm-svn: 346424
2018-11-08 18:08:43 +00:00
Andrea Di Biagio
b212088edb [llvm-mca] PR39261: Rename FetchStage to EntryStage.
This fixes PR39261.

FetchStage is a misnomer. It causes confusion with the frontend fetch stage,
which we don't currently simulate.  I decided to rename it into EntryStage
mainly because this is meant to be a "source" stage for all pipelines.

Differential Revision: https://reviews.llvm.org/D54268

llvm-svn: 346419
2018-11-08 17:49:30 +00:00
Matt Davis
5ff2f474ff [llvm-mca] Remove unneeded namespace qualifier. NFC.
llvm-svn: 346417
2018-11-08 17:32:45 +00:00
Matt Davis
89fe6b1c0c [llvm-mca] Move the AssembleInput logic into its own class.
Summary:
This patch introduces a CodeRegionGenerator class which is responsible for parsing some type of input and creating a 'CodeRegions' instance for use by llvm-mca.  In the future, we will also have a CodeRegionGenerator subclass for converting an input object file into CodeRegions.  For now, we only have the subclass for converting input assembly into CodeRegions.

This is mostly a NFC patch, as the logic remains close to the original, but now encapsulated in its own class and moved outside of llvm-mca.cpp.

Reviewers: andreadb, courbet, RKSimon

Reviewed By: andreadb

Subscribers: mgorny, tschuett, gbedwell, llvm-commits

Differential Revision: https://reviews.llvm.org/D54179

llvm-svn: 346344
2018-11-07 19:20:04 +00:00
Matt Davis
822bf2926c [llvm-mca] Remove the verb 'assemble' from a few options in help. NFC.
* MCA does not assemble anything.
* Ran clang-format.

llvm-svn: 345750
2018-10-31 17:47:25 +00:00
Andrea Di Biagio
26c424fe32 [llvm-mca] Lower to mca::Instructon before the pipeline is run.
Before this change, the lowering of instructions from llvm::MCInst to
mca::Instruction was done as part of the first stage of the pipeline (i.e. the
FetchStage).  In particular, FetchStage was responsible for picking the next
instruction from the source sequence, and lower it to an mca::Instruction with
the help of an object of class InstrBuilder.

The dependency on InstrBuilder was problematic for a number of reasons. Class
InstrBuilder only knows how to lower from llvm::MCInst to mca::Instruction.
That means, it is hard to support a different scenario where instructions
in input are not instances of class llvm::MCInst. Even if we managed to
specialize InstrBuilder, and generalize most of its internal logic, the
dependency on InstrBuilder in FetchStage would have caused more troubles (other
than complicating the pipeline logic).

With this patch, the lowering step is done before the pipeline is run. The
pipeline is no longer responsible for lowering from MCInst to mca::Instruction.
As a consequence of this, the FetchStage no longer needs to interact with an
InstrBuilder. The mca::SourceMgr class now simply wraps a reference to a
sequence of mca::Instruction objects.
This simplifies the logic of FetchStage, and increases the usability of it.  As
a result, on a debug build, we see a 7-9% speedup; on a release build, the
speedup is around 3-4%.

llvm-svn: 345500
2018-10-29 13:29:22 +00:00
Andrea Di Biagio
cb2576c9f1 [llvm-mca] Removed dependency on mca::SourcMgr in some Views. NFC
llvm-svn: 345376
2018-10-26 10:48:04 +00:00
Andrea Di Biagio
3d5c233899 [llvm-mca] Remove dependency from InstrBuilder in class InstructionTables.
Also, removed the initialization of vectors used for processor resource masks.
Support function 'computeProcResourceMasks()' already calls method resize on
those vectors.
No functional change intended.

llvm-svn: 345161
2018-10-24 16:56:43 +00:00
Andrea Di Biagio
694545a47e [llvm-mca] [llvm-mca] Improved error handling and error reporting from class InstrBuilder.
A new class named InstructionError has been added to Support.h in order to
improve the error reporting from class InstrBuilder.
The llvm-mca driver is responsible for handling InstructionError objects, and
printing them out to stderr.

The goal of this patch is to remove all the remaining error handling logic from
the library code.
In particular, this allows us to:
 - Simplify the logic in InstrBuilder by removing a needless dependency from
MCInstrPrinter.
 - Centralize all the error halding logic in a new function named 'runPipeline'
(see llvm-mca.cpp).

This is also a first step towards generalizing class InstrBuilder, so that in
future, we will be able to reuse its logic to also "lower" MachineInstr to
mca::Instruction objects.

Differential Revision: https://reviews.llvm.org/D53585

llvm-svn: 345129
2018-10-24 10:56:47 +00:00
Andrea Di Biagio
05decf8cc3 [llvm-mca] Use llvm::ArrayRef in class SourceMgr. NFCI
Class SourceMgr now uses type ArrayRef<MCInst> to reference the
sequence of code from a "CodeRegion".

llvm-svn: 344911
2018-10-22 15:36:15 +00:00
Owen Rodley
cfcabbacc1 Test commit. NFC.
llvm-svn: 343296
2018-09-28 04:51:45 +00:00
Andrea Di Biagio
989c373718 [llvm-mca] Don't disable the SummaryView if flag -all-stats is false.
llvm-svn: 340945
2018-08-29 17:40:04 +00:00
Matt Davis
46b8d41e4e [llvm-mca] Introduce the llvm-mca library and organize the directory accordingly. NFC.
Summary:
This patch introduces llvm-mca as a library.  The driver (llvm-mca.cpp), views, and stats, are not part of the library. 
Those are separate components that are not required for the functioning of llvm-mca.

The directory has been organized as follows:
All library source files now reside in:
  - `lib/HardwareUnits/` - All subclasses of HardwareUnit (these represent the simulated hardware components of a backend).
      (LSUnit does not inherit from HardwareUnit, but Scheduler does which uses LSUnit).  
  - `lib/Stages/` - All subclasses of the pipeline stages.
  - `lib/` - This is the root of the library and contains library code that does not fit into the Stages or HardwareUnit subdirs.

All library header files now reside in the `include` directory and mimic the same layout as the `lib` directory mentioned above.

In the (near) future we would like to move the library (include and lib) contents from tools and into the core of llvm somewhere.
That change would allow various analysis and optimization passes to make use of MCA  functionality for things like cost modeling.

I left all of the non-library code just where it has always been, in the root of the llvm-mca directory. 
The include directives for the non-library source file have been updated to refer to the llvm-mca library headers.
I updated the llvm-mca/CMakeLists.txt file to include the library headers, but I made the non-library code
explicitly reference the library's 'include' directory.  Once we eventually (hopefully) migrate the MCA library
components into llvm the include directives used by the non-library source files will be updated to point to the
proper location in llvm.

Reviewers: andreadb, courbet, RKSimon

Reviewed By: andreadb

Subscribers: mgorny, javed.absar, tschuett, gbedwell, llvm-commits

Differential Revision: https://reviews.llvm.org/D50929

llvm-svn: 340755
2018-08-27 17:16:32 +00:00
Matt Davis
1e4dfd0b90 [llvm-mca] Move views and stats into a Views subdir. NFC.
llvm-svn: 340645
2018-08-24 20:24:53 +00:00
Matt Davis
3ace459e8d [llvm-mca] Remove unused formal parameter. NFC.
llvm-svn: 340227
2018-08-20 22:41:27 +00:00
Matt Davis
8f7fca4891 [llvm-mca] Propagate fatal llvm-mca errors from library classes to driver.
Summary:
This patch introduces error handling to propagate the errors from llvm-mca library classes (or what will become library classes) up to the driver.  This patch also introduces an enum to make clearer the intention of the return value for Stage::execute.

This supports PR38101.

Reviewers: andreadb, courbet, RKSimon

Reviewed By: andreadb

Subscribers: llvm-commits, tschuett, gbedwell

Differential Revision: https://reviews.llvm.org/D50561

llvm-svn: 339594
2018-08-13 18:11:48 +00:00
Petr Hosek
a98aa0866c [ADT] Normalize empty triple components
LLVM triple normalization is handling "unknown" and empty components
differently; for example given "x86_64-unknown-linux-gnu" and
"x86_64-linux-gnu" which should be equivalent, triple normalization
returns "x86_64-unknown-linux-gnu" and "x86_64--linux-gnu". autoconf's
config.sub returns "x86_64-unknown-linux-gnu" for both
"x86_64-linux-gnu" and "x86_64-unknown-linux-gnu". This changes the
triple normalization to behave the same way, replacing empty triple
components with "unknown".

This addresses PR37129.

Differential Revision: https://reviews.llvm.org/D50219

llvm-svn: 339294
2018-08-08 22:23:57 +00:00
Matt Davis
d0dbf0dd79 [llvm-mca] Update the help text to reflect "physical" registers. NFC.
llvm-svn: 338430
2018-07-31 20:05:08 +00:00
Matt Davis
7599dd7f50 [llvm-mca] Turn InstructionTables into a Stage.
Summary:
This patch converts the InstructionTables class into a subclass of mca::Stage.  This change allows us to use the Stage's inherited Listeners for event notifications.  This also allows us to create a simple pipeline for viewing the InstructionTables report.

I have been working on a follow on patch that should cleanup addView in InstructionTables.  Right now, addView adds the view to both the Listener list and Views list.  The follow-on patch addresses the fact that we don't really need two lists in this case.  That change is not specific to just InstructionTables, so it will be a separate patch. 

Reviewers: andreadb, courbet, RKSimon

Reviewed By: andreadb

Subscribers: tschuett, gbedwell, llvm-commits

Differential Revision: https://reviews.llvm.org/D49329

llvm-svn: 337113
2018-07-14 23:52:50 +00:00
Andrea Di Biagio
aaaf1382f8 [llvm-mca] report an error if the assembly sequence contains an unsupported instruction.
This is a short-term fix for PR38093.
For now, we llvm::report_fatal_error if the instruction builder finds an
unsupported instruction in the instruction stream.

We need to revisit this fix once we start addressing PR38101.
Essentially, we need a better framework for error handling.

llvm-svn: 336543
2018-07-09 12:30:55 +00:00
Matt Davis
7464e6c913 [llvm-mca] Add HardwareUnit and Context classes.
This patch moves the construction of the default backend from llvm-mca.cpp and
into mca::Context. The Context class is responsible for holding ownership of
the simulated hardware components. These components are subclasses of
HardwareUnit. Right now the HardwareUnit is pretty bare-bones, but eventually
we might want to add some common functionality across all hardware components,
such as isReady() or something similar.

I have a feeling this patch will probably need some updates, but it's a start.
One thing I am not particularly fond of is the rather large interface for
createDefaultPipeline. That convenience routine takes a rather large set of
inputs from the llvm-mca driver, where many of those inputs are generated via
command line options.

One item I think we might want to change is the separating of ownership of
hardware components (owned by the context) and the pipeline (which owns
Stages). In short, a Pipeline owns Stages, a Context (currently) owns hardware.
The Pipeline's Stages make use of the components, and thus there is a lifetime
dependency generated. The components must outlive the pipeline. We could solve
this by having the Context also own the Pipeline, and not return a
unique_ptr<Pipeline>. Now that I think about it, I like that idea more.

Differential Revision: https://reviews.llvm.org/D48691

llvm-svn: 336456
2018-07-06 18:03:14 +00:00
Andrea Di Biagio
9758b05b3b [llvm-mca] Clear the content of map VariantDescriptors in InstrBuilder before we start analyzing a new CodeBlock. NFCI.
Different CodeBlocks don't overlap. The same MCInst cannot appear in more than
one code block because all blocks are instantiated before the simulation is run.

We should always clear the content of map VariantDescriptors before every
simulation, since VariantDescriptors cannot possibly store useful information
for the next blocks. It is also "safer" to clear its content because `MCInst*`
is used as the key type for map VariantDescriptors.

llvm-svn: 336142
2018-07-02 20:39:57 +00:00
Francis Visoiu Mistrih
bb4cf24f14 [MC] Error on a .zerofill directive in a non-virtual section
On darwin, all virtual sections have zerofill type, and having a
.zerofill directive in a non-virtual section is not allowed. Instead of
asserting, show a nicer error.

In order to use the equivalent of .zerofill in a non-virtual section,
the usage of .zero of .space is required.

This patch replaces the assert with an error.

Differential Revision: https://reviews.llvm.org/D48517

llvm-svn: 336127
2018-07-02 17:29:43 +00:00
Matt Davis
92f0c6b4ce [llvm-mca] Register listeners with stages; remove Pipeline dependency from Stage.
Summary:
This patch removes a few callbacks from Pipeline.  It comes at the cost of
registering Listeners with all Stages.  Not all stages need listeners or issue
callbacks, this registration is a bit redundant.  However, as we build-out the
API, this redundancy can disappear.

The main purpose here is to move callback code from the Pipeline and into the
stages that actually issue those callbacks. This removes the back-pointer to
the Pipeline that was put into a few Stage subclasses.

Reviewers: andreadb, courbet, RKSimon

Reviewed By: andreadb, courbet

Subscribers: tschuett, gbedwell, llvm-commits

Differential Revision: https://reviews.llvm.org/D48576

llvm-svn: 335748
2018-06-27 16:09:33 +00:00
Matt Davis
12a71b7b2e [llvm-mca] Rename Backend to Pipeline. NFC.
Summary:
This change renames the Backend and BackendPrinter to Pipeline and PipelinePrinter respectively. 
Variables and comments have also been updated to reflect this change.

The reason for this rename, is to be slightly more correct about what MCA is modeling.  MCA models a Pipeline, which implies some logical sequence of stages. 

Reviewers: andreadb, courbet, RKSimon

Reviewed By: andreadb, courbet

Subscribers: mgorny, javed.absar, tschuett, gbedwell, llvm-commits

Differential Revision: https://reviews.llvm.org/D48496

llvm-svn: 335496
2018-06-25 16:53:00 +00:00
Matt Davis
85ee56f371 [llvm-mca] Introduce a sequential container of Stages
Summary:
Remove explicit stages and introduce a list of stages.

A pipeline should be composed of an arbitrary list of stages, and not any
 predefined list of stages in the Backend.  The Backend should not know of any
 particular stage, rather it should only be concerned that it has a list of
 stages, and that those stages will fulfill the contract of what it means to be
 a Stage (namely pre/post/execute a given instruction).

For now, we leave the original set of stages defined in the Backend ctor;
however, I imagine these will be moved out at a later time.

This patch makes an adjustment to the semantics of Stage::isReady.
Specifically, what the Backend really needs to know is if a Stage has
unfinished work.  With that said, it is more appropriately renamed
Stage::hasWorkToComplete().  This change will clean up the check in
Backend::run(), allowing us to query each stage to see if there is unfinished
work, regardless of what subclass a stage might be.  I feel that this change
simplifies the semantics too, but that's a subjective statement.

Given how RetireStage and ExecuteStage handle data in their preExecute(), I've
had to change the order of Retire and Execute in our stage list.  Retire must
complete any of its preExecute actions before ExecuteStage's preExecute can
take control.  This is mainly because both stages utilize the RCU.  In the
meantime, I want to see if I can adjust that or remove that coupling.

Reviewers: andreadb, RKSimon, courbet

Reviewed By: andreadb

Subscribers: tschuett, gbedwell, llvm-commits

Differential Revision: https://reviews.llvm.org/D46907

llvm-svn: 335361
2018-06-22 16:17:26 +00:00
Andrea Di Biagio
4893e095df [llvm-mca][X86] Teach how to identify register writes that implicitly clear the upper portion of a super-register.
This patch teaches llvm-mca how to identify register writes that implicitly zero
the upper portion of a super-register.

On X86-64, a general purpose register is implemented in hardware as a 64-bit
register. Quoting the Intel 64 Software Developer's Manual: "an update to the
lower 32 bits of a 64 bit integer register is architecturally defined to zero
extend the upper 32 bits".  Also, a write to an XMM register performed by an AVX
instruction implicitly zeroes the upper 128 bits of the aliasing YMM register.

This patch adds a new method named clearsSuperRegisters to the MCInstrAnalysis
interface to help identify instructions that implicitly clear the upper portion
of a super-register.  The rest of the patch teaches llvm-mca how to use that new
method to obtain the information, and update the register dependencies
accordingly.

I compared the kernels from tests clear-super-register-1.s and
clear-super-register-2.s against the output from perf on btver2.  Previously
there was a large discrepancy between the estimated IPC and the measured IPC.
Now the differences are mostly in the noise.

Differential Revision: https://reviews.llvm.org/D48225

llvm-svn: 335113
2018-06-20 10:08:11 +00:00
Roman Lebedev
4ec9cfed2e [MCA] Add -summary-view option
Summary:
While that is indeed a quite interesting summary stat,
there are cases where it does not really add anything
other than consuming extra lines.

Declutters the output of D48190.

Reviewers: RKSimon, andreadb, courbet, craig.topper

Reviewed By: andreadb

Subscribers: javed.absar, gbedwell, llvm-commits

Differential Revision: https://reviews.llvm.org/D48209

llvm-svn: 334833
2018-06-15 14:01:43 +00:00
Andrea Di Biagio
b297f1e738 Revert: [llvm-mca] Flush the output stream before we start the analysis of a new code region. NFC
Not sure why, but it breaks buildbot clang-cmake-armv8-full.
It causes a failure in TEST 'Xray-armhf-linux :: TestCases/Posix/profiling-single-threaded.cc'.

llvm-svn: 334617
2018-06-13 16:33:52 +00:00
Andrea Di Biagio
24185018fb [llvm-mca] Flush the output stream before we start the analysis of a new code region. NFC
llvm-svn: 334610
2018-06-13 15:43:56 +00:00
Andrea Di Biagio
d476292a08 [llvm-mca] Print the "Block RThroughput" in the SummaryView.
This patch implements the "block reciprocal throughput" computation in the
SummaryView.

The block reciprocal throughput is computed as the MAX of:
  - NumMicroOps / DispatchWidth
  - Resource Cycles / #Units   (for every resource consumed).

The block throughput is bounded from above by the hardware dispatch throughput.
That is because the DispatchWidth is an upper bound on how many opcodes can be part
of a single dispatch group.

The block throughput is also limited by the amount of hardware parallelism. The
number of available resource units affects how the resource pressure is
distributed, and also how many blocks can be delivered every cycle.

llvm-svn: 333095
2018-05-23 15:59:27 +00:00
Andrea Di Biagio
f5e3a66963 [llvm-mca] Hide unrelated flags from the -help output.
llvm-svn: 332615
2018-05-17 15:35:14 +00:00
Andrea Di Biagio
8bbb45c175 [llvm-mca] add flag -all-views and flag -all-stats.
Flag -all-views enables all the views.
Flag -all-stats enables all the views that print hardware statistics.

llvm-svn: 332602
2018-05-17 12:27:03 +00:00
Matt Davis
efffaf5cb4 [llvm-mca] Introduce a pipeline Stage class and FetchStage.
Summary:
    This is just an idea, really two ideas.  I expect some push-back,
    but I realize that posting a diff is the most comprehensive way to express
    these concepts.

    This patch introduces a Stage class which represents the
    various stages of an instruction pipeline.  As a start, I have created a simple
    FetchStage that is based on existing logic for how MCA produces
    instructions, but now encapsulated in a Stage.  The idea should become more concrete
    once we introduce additional stages.  The idea being, that when a stage completes,
    the next stage in the pipeline will be executed.  Stages are chained together
    as a singly linked list to closely model a real pipeline. For now there is only one stage,
    so the stage-to-stage flow of instructions isn't immediately obvious.

    Eventually, Stage will also handle event notifications, but that functionality
    is not complete, and not destined for this patch.  Ideally, an interested party 
    can register for notifications from a particular stage.  Callbacks will be issued to
    these listeners at various points in the execution of the stage.  
    For now, eventing functionality remains similar to what it has been in mca::Backend. 
    We will be building-up the Stage class as we move on, such as adding debug output.

    This patch also removes the unique_ptr<Instruction> return value from
    InstrBuilder::createInstruction.  An Instruction pointer is still produced,
    but now it's up to the caller to decide how that item should be managed post-allocation
    (e.g., smart pointer).  This allows the Fetch stage to create instructions and
    manage the lifetime of those instructions as it wishes, and not have to be bound to any
    specific managed pointer type.  Other callers of createInstruction might have different 
    requirements, and thus can manage the pointer to fit their needs.  Another idea would be to push the
   ownership to the RCU. 

    Currently, the FetchStage will wrap the Instruction
    pointer in a shared_ptr.  This allows us to remove the Instruction container in
    Backend, which was probably going to disappear, or move, at some point anyways.
    Note that I did run these changes through valgrind, to make sure we are not leaking
    memory.  While the shared_ptr comes with some additional overhead it relieves us
    from having to manage a list of generated instructions, and/or make lookup calls
    to remove the instructions. 

    I realize that both the Stage class and the Instruction pointer management
    (mentioned directly above) are separate but related ideas, and probably should
    land as separate patches; I am happy to do that if either idea is decent.
    The main reason these two ideas are together is that
    Stage::execute() can mutate an InstRef. For the fetch stage, the InstRef is populated
    as the primary action of that stage (execute()).  I didn't want to change the Stage interface
    to support the idea of generating an instruction.  Ideally, instructions are to
    be pushed through the pipeline.  I didn't want to draw too much of a
    specialization just for the fetch stage.  Excuse the word-salad.

Reviewers: andreadb, courbet, RKSimon

Reviewed By: andreadb

Subscribers: llvm-commits, mgorny, javed.absar, tschuett, gbedwell

Differential Revision: https://reviews.llvm.org/D46741

llvm-svn: 332390
2018-05-15 20:21:04 +00:00
Andrea Di Biagio
c6d4a11d75 [llvm-mca] removes flag -instruction-tables from the "View Options" category.
This patch also improves the description of a couple of flags in the view
options. With this change, the -help now specifies which views are enabled by
default.

llvm-svn: 331594
2018-05-05 15:36:47 +00:00