1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-20 03:23:01 +02:00
Commit Graph

43 Commits

Author SHA1 Message Date
James Henderson
f75dcd7d67 [docs][tools] Add missing "program" tags to rst files
Sphinx allows for definitions of command-line options using
`.. option <name>` and references to those options via `:option:<name>`.
However, it looks like there is no scoping of these options by default,
meaning that links can end up pointing to incorrect documents. See for
example the llvm-mca document, which contains references to -o that,
prior to this patch, pointed to a different document. What's worse is
that these links appear to be non-deterministic in which one is picked
(on my machine, some references end up pointing to opt, whereas on the
live docs, they point to llvm-dwarfdump, for example).

The fix is to add the .. program <name> tag. This essentially namespaces
the options (definitions and references) to the named program, ensuring
that the links are kept correct.

Reviwed by: andreadb

Differential Revision: https://reviews.llvm.org/D63873

llvm-svn: 364538
2019-06-27 13:24:46 +00:00
Andrea Di Biagio
c160d26a4d [llvm-mca][docs] clarify how the quality of the perf report is affected by the quality of the scheduling models.
Differential Revision: https://reviews.llvm.org/D63556

llvm-svn: 363830
2019-06-19 16:10:58 +00:00
Matt Davis
25329df600 [Docs] [llvm-mca] Point out a caveat for using llvm-mca markers in source code.
Summary: See: https://bugs.llvm.org/show_bug.cgi?id=42173

Reviewers: andreadb, mattd, RKSimon, spatel

Reviewed By: andreadb

Subscribers: tschuett, gbedwell, llvm-commits, andreadb

Tags: #llvm

Patch by Max Marrone (maxpm)! Thanks!

Differential Revision: https://reviews.llvm.org/D63040

llvm-svn: 362979
2019-06-10 20:38:56 +00:00
Andrea Di Biagio
98e0298cb2 [MCA] Add support for nested and overlapping region markers
This patch fixes PR41523
https://bugs.llvm.org/show_bug.cgi?id=41523

Regions can now nest/overlap provided that they have different names.
Anonymous regions cannot overlap.

Region end markers must specify the region name. The only exception is for when
there is only one user-defined region; in that particular case, the region end
marker doesn't need to specify a name.

Incorrect region end markers are no longer ignored. Instead, the tool reports an
error and we exit with an error code.

Added test cases to verify the new diagnostic error messages.

Updated the llvm-mca docs to reflect this feature change.

Differential Revision: https://reviews.llvm.org/D61676

llvm-svn: 360351
2019-05-09 15:18:09 +00:00
Andrea Di Biagio
c08b465d91 [llvm-mca][scheduler-stats] Print issued micro opcodes per cycle. NFCI
It makes more sense to print out the number of micro opcodes that are issued
every cycle rather than the number of instructions issued per cycle.
This behavior is also consistent with the dispatch-stats: numbers from the two
views can now be easily compared.

llvm-svn: 357919
2019-04-08 16:05:54 +00:00
Andrea Di Biagio
c5a150eca8 [MCA] Highlight kernel bottlenecks in the summary view.
This patch adds a new flag named -bottleneck-analysis to print out information
about throughput bottlenecks.

MCA knows how to identify and classify dynamic dispatch stalls. However, it
doesn't know how to analyze and highlight kernel bottlenecks.  The goal of this
patch is to teach MCA how to correlate increases in backend pressure to backend
stalls (and therefore, the loss of throughput).

From a Scheduler point of view, backend pressure is a function of the scheduler
buffer usage (i.e. how the number of uOps in the scheduler buffers changes over
time). Backend pressure increases (or decreases) when there is a mismatch
between the number of opcodes dispatched, and the number of opcodes issued in
the same cycle.  Since buffer resources are limited, continuous increases in
backend pressure would eventually leads to dispatch stalls. So, there is a
strong correlation between dispatch stalls, and how backpressure changed over
time.

This patch teaches how to identify situations where backend pressure increases
due to:
 - unavailable pipeline resources.
 - data dependencies.

Data dependencies may delay execution of instructions and therefore increase the
time that uOps have to spend in the scheduler buffers. That often translates to
an increase in backend pressure which may eventually lead to a bottleneck.
Contention on pipeline resources may also delay execution of instructions, and
lead to a temporary increase in backend pressure.

Internally, the Scheduler classifies instructions based on whether register /
memory operands are available or not.

An instruction is marked as "ready to execute" only if data dependencies are
fully resolved.
Every cycle, the Scheduler attempts to execute all instructions that are ready
to execute. If an instruction cannot execute because of unavailable pipeline
resources, then the Scheduler internally updates a BusyResourceUnits mask with
the ID of each unavailable resource.

ExecuteStage is responsible for tracking changes in backend pressure. If backend
pressure increases during a cycle because of contention on pipeline resources,
then ExecuteStage sends a "backend pressure" event to the listeners.
That event would contain information about instructions delayed by resource
pressure, as well as the BusyResourceUnits mask.

Note that ExecuteStage also knows how to identify situations where backpressure
increased because of delays introduced by data dependencies.

The SummaryView observes "backend pressure" events and prints out a "bottleneck
report".

Example of bottleneck report:

```
Cycles with backend pressure increase [ 99.89% ]
Throughput Bottlenecks:
  Resource Pressure       [ 0.00% ]
  Data Dependencies:      [ 99.89% ]
   - Register Dependencies [ 0.00% ]
   - Memory Dependencies   [ 99.89% ]
```

A bottleneck report is printed out only if increases in backend pressure
eventually caused backend stalls.

About the time complexity:

Time complexity is linear in the number of instructions in the
Scheduler::PendingSet.

The average slowdown tends to be in the range of ~5-6%.
For memory intensive kernels, the slowdown can be significant if flag
-noalias=false is specified. In the worst case scenario I have observed a
slowdown of ~30% when flag -noalias=false was specified.

We can definitely recover part of that slowdown if we optimize class LSUnit (by
doing extra bookkeeping to speedup queries). For now, this new analysis is
disabled by default, and it can be enabled via flag -bottleneck-analysis. Users
of MCA as a library can enable the generation of pressure events through the
constructor of ExecuteStage.

This patch partially addresses https://bugs.llvm.org/show_bug.cgi?id=37494

Differential Revision: https://reviews.llvm.org/D58728

llvm-svn: 355308
2019-03-04 11:52:34 +00:00
Andrea Di Biagio
57fcc40fb8 [llvm-mca][View] Improved Retire Control Unit Statistics.
RetireControlUnitStatistics now reports extra information about the ROB and the
avg/maximum number of entries consumed over the entire simulation.

Example:
  Retire Control Unit - number of cycles where we saw N instructions retired:
  [# retired], [# cycles]
   0,           109  (17.9%)
   1,           102  (16.7%)
   2,           399  (65.4%)

  Total ROB Entries:                64
  Max Used ROB Entries:             35  ( 54.7% )
  Average Used ROB Entries per cy:  32  ( 50.0% )

Documentation in llvm/docs/CommandGuide/llvmn-mca.rst has been updated to
reflect this change.

llvm-svn: 347493
2018-11-23 12:12:57 +00:00
Andrea Di Biagio
7e5d9331c7 [llvm-mca] Report the number of dispatched micro opcodes in the DispatchStatistics view.
This patch introduces the following changes to the DispatchStatistics view:
 * DispatchStatistics now reports the number of dispatched opcodes instead of
   the number of dispatched instructions.
 * The "Dynamic Dispatch Stall Cycles" table now also reports the percentage of
   stall cycles against the total simulated cycles.

This change allows users to easily compare dispatch group sizes with the
processor DispatchWidth.
Before this change, it was difficult to correlate the two numbers, since
DispatchStatistics view reported numbers of instructions (instead of opcodes).
DispatchWidth defines the maximum size of a dispatch group in terms of number of
micro opcodes.

The other change introduced by this patch is related to how DispatchStage
generates "instruction dispatch" events.
In particular:
 * There can be multiple dispatch events associated with a same instruction
 * Each dispatch event now encapsulates the number of dispatched micro opcodes.

The number of micro opcodes declared by an instruction may exceed the processor
DispatchWidth. Therefore, we cannot assume that instructions are always fully
dispatched in a single cycle.
DispatchStage knows already how to handle instructions declaring a number of
opcodes bigger that DispatchWidth. However, DispatchStage always emitted a
single instruction dispatch event (during the first simulated dispatch cycle)
for instructions dispatched.

With this patch, DispatchStage now correctly notifies multiple dispatch events
for instructions that cannot be dispatched in a single cycle.

A few views had to be modified. Views can no longer assume that there can only
be one dispatch event per instruction.

Tests (and docs) have been updated.

Differential Revision: https://reviews.llvm.org/D51430

llvm-svn: 341055
2018-08-30 10:50:20 +00:00
Andrea Di Biagio
80b01d0203 [llvm-mca] Add fields "Total uOps" and "uOps Per Cycle" to the report generated by the SummaryView.
This patch adds two new fields to the perf report generated by the SummaryView.
Fields are now logically organized into two small groups; only the second group
contains throughput indicators.

Example:
```
Iterations:        100
Instructions:      300
Total Cycles:      414
Total uOps:        700

Dispatch Width:    4
uOps Per Cycle:    1.69
IPC:               0.72
Block RThroughput: 4.0
```

This patch also updates the docs for llvm-mca.
Due to the nature of this change, several tests in the tools/llvm-mca directory
were affected, and had to be updated using script `update_mca_test_checks.py`.

llvm-svn: 340946
2018-08-29 17:56:39 +00:00
Andrea Di Biagio
f707cd4166 [llvm-mca] Improved report generated by the SchedulerStatistics view.
Before this patch, the SchedulerStatistics only printed the maximum number of
buffer entries consumed in each scheduler's queue at a given point of the
simulation.

This patch restructures the reported table, and adds an extra field named
"Average number of used buffer entries" to it.
This patch also uses different colors to help identifying bottlenecks caused by
high scheduler's buffer pressure.

llvm-svn: 340746
2018-08-27 14:52:52 +00:00
Matt Davis
b0535f09cc [llvm-mca][docs] Move the code marker text into its own subsection. NFC.
Also fixed a few undecorated 'llvm-mca' references to be highlighted
with the 'program' emphasis.

llvm-svn: 338900
2018-08-03 15:56:07 +00:00
Andrea Di Biagio
3d390604c7 [llvm-mca] Speed up the computation of the wait/ready/issued sets in the Scheduler.
This patch is a follow-up to r338702.

We don't need to use a map to model the wait/ready/issued sets. It is much more
efficient to use a vector instead.

This patch gives us an average 7.5% speedup (on top of the ~12% speedup obtained
after r338702).

llvm-svn: 338883
2018-08-03 12:55:28 +00:00
Andrea Di Biagio
1aca2c2e82 [llvm-mca][docs] Improve the CommandLine documentation.
This patch replaces all the remaining occurrences of string "MCA" with
":program:`llvm-mca`".  Somehow I missed those strings when I committed r338394.

This patch also improves section "Instruction Dispatch".

llvm-svn: 338881
2018-08-03 12:44:56 +00:00
Matt Davis
7b7e97dd13 [llvm-mca][docs] Replace "temporary" with "physical registers". NFC.
llvm-svn: 338415
2018-07-31 18:59:46 +00:00
Andrea Di Biagio
4a150d2528 [llvm-mca][docs] Improve the "How LLVM-MCA works" section.
llvm-svn: 338410
2018-07-31 18:19:15 +00:00
Andrea Di Biagio
159d252dca [llvm-mca][docs] Always use llvm-mca in place of MCA.
llvm-svn: 338394
2018-07-31 15:29:10 +00:00
Matt Davis
2fb231bb42 [llvm-mca][docs] Add instruction flow documentation. NFC.
Summary:
This patch mostly copies the existing Instruction Flow, and stage descriptions
from the mca README.  I made a few text tweaks, but no semantic changes,
and made reference to the "default pipeline."  I also removed the internals
references (e.g., reference to class names and header files).  I did leave the
LSUnit name around, but only as an abbreviated word for the load-store unit.


Reviewers: andreadb, courbet, RKSimon, gbedwell, filcab

Reviewed By: andreadb

Subscribers: tschuett, jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D49692

llvm-svn: 338319
2018-07-30 22:30:14 +00:00
Matt Davis
c79a134b78 [llvm-mca][docs] Define IPC where it is first mentioned. NFC.
Expand the abbreviation where it is first used, and use IPC elsewhere.

llvm-svn: 337739
2018-07-23 21:10:50 +00:00
Matt Davis
9597587b97 [llvm-mca][docs] Add documentation for the statistic outputs from mca. NFC
Summary: The original text was lifted from the MCA README.  I re-ran the dot-product example and updated the output seen in the docs.  I also added a few paragraphs discussing the instruction issued and retired histograms, as well as discussing the register file stats.

Reviewers: andreadb, RKSimon, courbet, gbedwell, filcab

Reviewed By: andreadb

Subscribers: tschuett

Differential Revision: https://reviews.llvm.org/D49614

llvm-svn: 337648
2018-07-21 18:32:47 +00:00
Matt Davis
3fa9be1fea [llvm-mca][docs] Add Timeline and How MCA works.
For the most part, these changes were from the RFC.  I made a few minor
word/structure changes, but nothing significant.  I also regenerated the
example output, and adjusted the text accordingly.

Differential Revision: https://reviews.llvm.org/D49527

llvm-svn: 337496
2018-07-19 20:33:59 +00:00
Matt Davis
7447fc96e1 [llvm-mca][docs] Revert mca internals docs.
We're going to work on this in a separate review focusing more on documenting
the View and probably removing some of the less-interesting/less-useful pieces.

This reverts r337219,337225

llvm-svn: 337295
2018-07-17 16:11:54 +00:00
Matt Davis
4c4b8e599d [llvm-mca][docs] Add notes about cycle and resource callbacks. NFC.
llvm-svn: 337225
2018-07-16 23:50:53 +00:00
Matt Davis
c3a63ec6c0 [llvm-mca][docs] Initial description of mca internals. NFC
This patch introduces a brief description of the components of MCA.  The main
focus is on Views.   This is a work in progress, and more descriptions will be
introduced later.  I want to flesh-out the Views section more and provide a
detailed description of eventing in MCA.  Eventually a brief code example of a
View should accompany the description.

Also, we should consider moving the MCA internals guide elsewhere at some point.

llvm-svn: 337219
2018-07-16 21:42:58 +00:00
Simon Pilgrim
5d64fe4125 Fix typo in declaring code-block snippet
llvm-svn: 332630
2018-05-17 16:58:42 +00:00
Andrea Di Biagio
74d366f0da [llvm-mca] Add an example showing how to get Intel assembly syntax
Patch by Jeff Muizelaar.

llvm-svn: 332627
2018-05-17 16:48:53 +00:00
Andrea Di Biagio
8bbb45c175 [llvm-mca] add flag -all-views and flag -all-stats.
Flag -all-views enables all the views.
Flag -all-stats enables all the views that print hardware statistics.

llvm-svn: 332602
2018-05-17 12:27:03 +00:00
Andrea Di Biagio
517a29f95b [llvm-mca] Default to the native host cpu if flag -mcpu is not specified.
llvm-svn: 330809
2018-04-25 10:18:25 +00:00
Andrea Di Biagio
9a3d82e1c1 [llvm-mca][CommandGuide] Fix typo in example.
llvm-svn: 330703
2018-04-24 10:09:32 +00:00
Andrea Di Biagio
d9344b8000 [llvm-mca] Renamed BackendStatistics to RetireControlUnitStatistics.
Also, removed flag -verbose in favor of flag -retire-stats.

llvm-svn: 329794
2018-04-11 12:12:53 +00:00
Andrea Di Biagio
b46fe1126f [llvm-mca] Move the logic that prints scheduler statistics from BackendStatistics to its own view.
Added flag -scheduler-stats to print scheduler related statistics.

llvm-svn: 329792
2018-04-11 11:37:46 +00:00
Sanjay Patel
65e7849dc2 [llvm-mca] reorder text
On 2nd reading, putting the C example after the bit about
multiple regions makes this flow better.

llvm-svn: 329732
2018-04-10 18:10:14 +00:00
Sanjay Patel
1ce0d0fab1 [llvm-mca] fix formatting
llvm-svn: 329729
2018-04-10 17:56:24 +00:00
Sanjay Patel
1455c44d73 [llvm-mca] add example workflow for source code
This is copied from Andrea's text in PR36875:
https://bugs.llvm.org/show_bug.cgi?id=36875

As noted there, this is a hack...but it's a good one!
It's important to show potential workflows up-front
with examples, so customers can copy and experiment
with them.

llvm-svn: 329726
2018-04-10 17:49:45 +00:00
Andrea Di Biagio
7af89ed924 [llvm-mca] Move the logic that prints dispatch unit statistics from BackendStatistics to its own view.
This patch moves the logic that collects and analyzes dispatch events to the
DispatchStatistics view.

Added flag -dispatch-stats to print statistics related to the dispatch logic.

llvm-svn: 329708
2018-04-10 14:55:14 +00:00
Andrea Di Biagio
af24ba4a16 [llvm-mca] Increase the default number of iterations to 100.
llvm-svn: 329694
2018-04-10 12:50:03 +00:00
Andrea Di Biagio
66e474bb74 [llvm-mca] Add the ability to mark regions of code for analysis (PR36875)
This patch teaches llvm-mca how to parse code comments in search for special
"markers" used to select regions of code.

Example:

# LLVM-MCA-BEGIN My Code Region
  ....
# LLVM-MCA-END

The MCAsmLexer now delegates to an object of class MCACommentParser (i.e. an
AsmCommentConsumer) the parsing of code comments to search for begin/end code
region markers.

A comment starting with substring "LLVM-MCA-BEGIN" marks the beginning of a new
region of code.  A comment starting with substring "LLVM-MCA-END" marks the end
of the last region.

This implementation doesn't allow regions to overlap. Each region can have a
optional description; internally, each region is identified by a range of source
code locations (SMLoc).

MCInst objects are added to a region R only if the source location for the
MCInst is in the range of locations specified by R.

By default, the tool allocates an implicit "Default" code region which contains
every source location.  See new tests llvm-mca-marker-*.s for a few examples.

A new Backend object is created for every region. So, the analysis is conducted
on every parsed code region.  The final report is the union of the reports
generated for every code region.  Note that empty regions are skipped.

Special "[#] Code Region - ..." strings are used in the report to mark the
portion which is specific to a code region only. For example, see
llvm-mca-markers-5.s.

Differential Revision: https://reviews.llvm.org/D45433

llvm-svn: 329590
2018-04-09 16:39:52 +00:00
Andrea Di Biagio
ee5b4f2814 [documentation][llvm-mca] Update the documentation.
Scheduling models can now describe processor register files and retire control
units. This updates the existing documentation and the README file.

llvm-svn: 329311
2018-04-05 16:42:32 +00:00
Andrea Di Biagio
56f9bc1f61 [llvm-mca] Remove flag -max-retire-per-cycle, and update the docs.
This is done in preparation for D45259.
With D45259, models can specify the size of the reorder buffer, and the retire
throughput directly via tablegen.

llvm-svn: 329274
2018-04-05 11:36:50 +00:00
Andrea Di Biagio
e2bef9a902 [llvm-mca] Move the logic that prints register file statistics to its own view. NFCI
Before this patch, the "BackendStatistics" view was responsible for printing the
register file usage (as well as many other statistics).

Now users can enable register file usage statistics using the command line flag
`-register-file-stats`. By default, the tool doesn't print register file
statistics.

llvm-svn: 329083
2018-04-03 16:46:23 +00:00
Andrea Di Biagio
49f7508452 [llvm-mca] Add a flag -instruction-info to enable/disable the instruction info view.
llvm-svn: 328493
2018-03-26 13:44:54 +00:00
Andrea Di Biagio
7168afd322 [llvm-mca] Update the commandline docs after r328305.
Document that flag -resource-pressure can be used to enable/disable the resource
pressure view. This change should have been part of r328305.

llvm-svn: 328492
2018-03-26 13:21:48 +00:00
Andrea Di Biagio
a309a8b1e0 [llvm-mca] Add flag -instruction-tables to print the theoretical resource pressure distribution for instructions (PR36874)
The goal of this patch is to address most of PR36874.  To fully fix PR36874 we
need to split the "InstructionInfo" view from the "SummaryView". That would make
easy to check the latency and rthroughput as well.

The patch reuses all the logic from ResourcePressureView to print out the
"instruction tables".

We have an entry for every instruction in the input sequence. Each entry reports
the theoretical resource pressure distribution. Resource pressure is uniformly
distributed across all the processor resource units of a group.

At the moment, the backend pipeline is not configurable, so the only way to fix
this is by creating a different driver that simply sends instruction events to
the resource pressure view.  That means, we don't use the Backend interface.
Instead, it is simpler to just have a different code-path for when flag
-instruction-tables is specified.

Once Clement addresses bug 36663, then we can port the "instruction tables"
logic into a stage of our configurable pipeline.

Updated the BtVer2 test cases (thanks Simon for the help). Now we pass flag
-instruction-tables to each modified test.

Differential Revision: https://reviews.llvm.org/D44839

llvm-svn: 328487
2018-03-26 12:04:53 +00:00
Andrea Di Biagio
45f0e5261e [llvm-mca] LLVM Machine Code Analyzer.
llvm-mca is an LLVM based performance analysis tool that can be used to
statically measure the performance of code, and to help triage potential
problems with target scheduling models.

llvm-mca uses information which is already available in LLVM (e.g. scheduling
models) to statically measure the performance of machine code in a specific cpu.
Performance is measured in terms of throughput as well as processor resource
consumption. The tool currently works for processors with an out-of-order
backend, for which there is a scheduling model available in LLVM.

The main goal of this tool is not just to predict the performance of the code
when run on the target, but also help with diagnosing potential performance
issues.

Given an assembly code sequence, llvm-mca estimates the IPC (instructions per
cycle), as well as hardware resources pressure. The analysis and reporting style
were mostly inspired by the IACA tool from Intel.

This patch is related to the RFC on llvm-dev visible at this link:
http://lists.llvm.org/pipermail/llvm-dev/2018-March/121490.html

Differential Revision: https://reviews.llvm.org/D43951

llvm-svn: 326998
2018-03-08 13:05:02 +00:00