[llvm-mca][docs] Improve the CommandLine documentation.

This patch replaces all the remaining occurrences of string "MCA" with ":program:`llvm-mca`". Somehow I missed those strings when I committed r338394. This patch also improves section "Instruction Dispatch". llvm-svn: 338881
2024-10-18 18:42:46 +02:00 · 2018-08-03 12:44:56 +00:00 · 2018-08-03 12:44:56 +00:00 · 1aca2c2e82
commit 1aca2c2e82
parent e27c5b613c
1 changed files with 38 additions and 39 deletions
--- a/docs/CommandGuide/llvm-mca.rst
+++ b/docs/CommandGuide/llvm-mca.rst
@ -454,8 +454,8 @@ The ``-all-stats`` command line option enables extra statistics and performance
 counters for the dispatch logic, the reorder buffer, the retire control unit,
 and the register file.

-Below is an example of ``-all-stats`` output generated by MCA for the
-dot-product example discussed in the previous sections.
+Below is an example of ``-all-stats`` output generated by  :program:`llvm-mca`
+for the dot-product example discussed in the previous sections.

 .. code-block:: none

@ -514,17 +514,16 @@ SCHEDQ reports 272 cycles.  This counter is incremented every time the dispatch
 logic is unable to dispatch a group of two instructions because the scheduler's
 queue is full.

-Looking at the *Dispatch Logic* table, we see that the pipeline was only able
-to dispatch two instructions 51.5% of the time.  The dispatch group was limited
-to one instruction 44.6% of the cycles, which corresponds to 272 cycles.  The
+Looking at the *Dispatch Logic* table, we see that the pipeline was only able to
+dispatch two instructions 51.5% of the time.  The dispatch group was limited to
+one instruction 44.6% of the cycles, which corresponds to 272 cycles.  The
 dispatch statistics are displayed by either using the command option
 ``-all-stats`` or ``-dispatch-stats``.

 The next table, *Schedulers*, presents a histogram displaying a count,
 representing the number of instructions issued on some number of cycles.  In
-this case, of the 610 simulated cycles, single
-instructions were issued 306 times (50.2%) and there were 7 cycles where
-no instructions were issued.
+this case, of the 610 simulated cycles, single instructions were issued 306
+times (50.2%) and there were 7 cycles where no instructions were issued.

 The *Scheduler's queue usage* table shows that the maximum number of buffer
 entries (i.e., scheduler queue entries) used at runtime.  Resource JFPU01
@ -543,28 +542,28 @@ A full scheduler queue is either caused by data dependency chains or by a
 sub-optimal usage of hardware resources.  Sometimes, resource pressure can be
 mitigated by rewriting the kernel using different instructions that consume
 different scheduler resources.  Schedulers with a small queue are less resilient
-to bottlenecks caused by the presence of long data dependencies.
-The scheduler statistics are displayed by
-using the command option ``-all-stats`` or ``-scheduler-stats``.
+to bottlenecks caused by the presence of long data dependencies.  The scheduler
+statistics are displayed by using the command option ``-all-stats`` or
+``-scheduler-stats``.

 The next table, *Retire Control Unit*, presents a histogram displaying a count,
 representing the number of instructions retired on some number of cycles.  In
-this case, of the 610 simulated cycles, two instructions were retired during
-the same cycle 399 times (65.4%) and there were 109 cycles where no
-instructions were retired.  The retire statistics are displayed by using the
-command option ``-all-stats`` or ``-retire-stats``.
+this case, of the 610 simulated cycles, two instructions were retired during the
+same cycle 399 times (65.4%) and there were 109 cycles where no instructions
+were retired.  The retire statistics are displayed by using the command option
+``-all-stats`` or ``-retire-stats``.

 The last table presented is *Register File statistics*.  Each physical register
 file (PRF) used by the pipeline is presented in this table.  In the case of AMD
-Jaguar, there are two register files, one for floating-point registers
-(JFpuPRF) and one for integer registers (JIntegerPRF).  The table shows that of
-the 900 instructions processed, there were 900 mappings created.  Since this
-dot-product example utilized only floating point registers, the JFPuPRF was
-responsible for creating the 900 mappings.  However, we see that the pipeline
-only used a maximum of 35 of 72 available register slots at any given time. We
-can conclude that the floating point PRF was the only register file used for
-the example, and that it was never resource constrained.  The register file
-statistics are displayed by using the command option ``-all-stats`` or
+Jaguar, there are two register files, one for floating-point registers (JFpuPRF)
+and one for integer registers (JIntegerPRF).  The table shows that of the 900
+instructions processed, there were 900 mappings created.  Since this dot-product
+example utilized only floating point registers, the JFPuPRF was responsible for
+creating the 900 mappings.  However, we see that the pipeline only used a
+maximum of 35 of 72 available register slots at any given time. We can conclude
+that the floating point PRF was the only register file used for the example, and
+that it was never resource constrained.  The register file statistics are
+displayed by using the command option ``-all-stats`` or
 ``-register-file-stats``.

 In this example, we can conclude that the IPC is mostly limited by data
@ -572,8 +571,8 @@ dependencies, and not by resource pressure.

 Instruction Flow
 ^^^^^^^^^^^^^^^^
-This section describes the instruction flow through MCA's default out-of-order
-pipeline, as well as the functional units involved in the process.
+This section describes the instruction flow through the default pipeline of
+:program:`llvm-mca`, as well as the functional units involved in the process.

 The default pipeline implements the following sequence of stages used to
 process instructions.
@ -585,9 +584,9 @@ process instructions.

 The default pipeline only models the out-of-order portion of a processor.
 Therefore, the instruction fetch and decode stages are not modeled. Performance
-bottlenecks in the frontend are not diagnosed.  MCA assumes that instructions
-have all been decoded and placed into a queue.  Also, MCA does not model branch
-prediction.
+bottlenecks in the frontend are not diagnosed. :program:`llvm-mca` assumes that
+instructions have all been decoded and placed into a queue before the simulation
+start.  Also, :program:`llvm-mca` does not model branch prediction.

 Instruction Dispatch
 """"""""""""""""""""
@ -607,19 +606,19 @@ An instruction can be dispatched if:
 * The schedulers are not full.

 Scheduling models can optionally specify which register files are available on
-the processor. MCA uses that information to initialize register file
-descriptors.  Users can limit the number of physical registers that are
+the processor. :program:`llvm-mca` uses that information to initialize register
+file descriptors.  Users can limit the number of physical registers that are
 globally available for register renaming by using the command option
-``-register-file-size``.  A value of zero for this option means *unbounded*.
-By knowing how many registers are available for renaming, MCA can predict
-dispatch stalls caused by the lack of registers.
+``-register-file-size``.  A value of zero for this option means *unbounded*. By
+knowing how many registers are available for renaming, the tool can predict
+dispatch stalls caused by the lack of physical registers.

 The number of reorder buffer entries consumed by an instruction depends on the
-number of micro-opcodes specified by the target scheduling model.  MCA's
-reorder buffer's purpose is to track the progress of instructions that are
-"in-flight," and to retire instructions in program order.  The number of
-entries in the reorder buffer defaults to the `MicroOpBufferSize` provided by
-the target scheduling model.
+number of micro-opcodes specified for that instruction by the target scheduling
+model.  The reorder buffer is responsible for tracking the progress of
+instructions that are "in-flight", and retiring them in program order.  The
+number of entries in the reorder buffer defaults to the value specified by field
+`MicroOpBufferSize` in the target scheduling model.

 Instructions that are dispatched to the schedulers consume scheduler buffer
 entries. :program:`llvm-mca` queries the scheduling model to determine the set