1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-23 19:23:23 +01:00

[llvm-mca][docs] Improve the CommandLine documentation.

This patch replaces all the remaining occurrences of string "MCA" with
":program:`llvm-mca`".  Somehow I missed those strings when I committed r338394.

This patch also improves section "Instruction Dispatch".

llvm-svn: 338881
This commit is contained in:
Andrea Di Biagio 2018-08-03 12:44:56 +00:00
parent e27c5b613c
commit 1aca2c2e82

View File

@ -454,8 +454,8 @@ The ``-all-stats`` command line option enables extra statistics and performance
counters for the dispatch logic, the reorder buffer, the retire control unit, counters for the dispatch logic, the reorder buffer, the retire control unit,
and the register file. and the register file.
Below is an example of ``-all-stats`` output generated by MCA for the Below is an example of ``-all-stats`` output generated by :program:`llvm-mca`
dot-product example discussed in the previous sections. for the dot-product example discussed in the previous sections.
.. code-block:: none .. code-block:: none
@ -514,17 +514,16 @@ SCHEDQ reports 272 cycles. This counter is incremented every time the dispatch
logic is unable to dispatch a group of two instructions because the scheduler's logic is unable to dispatch a group of two instructions because the scheduler's
queue is full. queue is full.
Looking at the *Dispatch Logic* table, we see that the pipeline was only able Looking at the *Dispatch Logic* table, we see that the pipeline was only able to
to dispatch two instructions 51.5% of the time. The dispatch group was limited dispatch two instructions 51.5% of the time. The dispatch group was limited to
to one instruction 44.6% of the cycles, which corresponds to 272 cycles. The one instruction 44.6% of the cycles, which corresponds to 272 cycles. The
dispatch statistics are displayed by either using the command option dispatch statistics are displayed by either using the command option
``-all-stats`` or ``-dispatch-stats``. ``-all-stats`` or ``-dispatch-stats``.
The next table, *Schedulers*, presents a histogram displaying a count, The next table, *Schedulers*, presents a histogram displaying a count,
representing the number of instructions issued on some number of cycles. In representing the number of instructions issued on some number of cycles. In
this case, of the 610 simulated cycles, single this case, of the 610 simulated cycles, single instructions were issued 306
instructions were issued 306 times (50.2%) and there were 7 cycles where times (50.2%) and there were 7 cycles where no instructions were issued.
no instructions were issued.
The *Scheduler's queue usage* table shows that the maximum number of buffer The *Scheduler's queue usage* table shows that the maximum number of buffer
entries (i.e., scheduler queue entries) used at runtime. Resource JFPU01 entries (i.e., scheduler queue entries) used at runtime. Resource JFPU01
@ -543,28 +542,28 @@ A full scheduler queue is either caused by data dependency chains or by a
sub-optimal usage of hardware resources. Sometimes, resource pressure can be sub-optimal usage of hardware resources. Sometimes, resource pressure can be
mitigated by rewriting the kernel using different instructions that consume mitigated by rewriting the kernel using different instructions that consume
different scheduler resources. Schedulers with a small queue are less resilient different scheduler resources. Schedulers with a small queue are less resilient
to bottlenecks caused by the presence of long data dependencies. to bottlenecks caused by the presence of long data dependencies. The scheduler
The scheduler statistics are displayed by statistics are displayed by using the command option ``-all-stats`` or
using the command option ``-all-stats`` or ``-scheduler-stats``. ``-scheduler-stats``.
The next table, *Retire Control Unit*, presents a histogram displaying a count, The next table, *Retire Control Unit*, presents a histogram displaying a count,
representing the number of instructions retired on some number of cycles. In representing the number of instructions retired on some number of cycles. In
this case, of the 610 simulated cycles, two instructions were retired during this case, of the 610 simulated cycles, two instructions were retired during the
the same cycle 399 times (65.4%) and there were 109 cycles where no same cycle 399 times (65.4%) and there were 109 cycles where no instructions
instructions were retired. The retire statistics are displayed by using the were retired. The retire statistics are displayed by using the command option
command option ``-all-stats`` or ``-retire-stats``. ``-all-stats`` or ``-retire-stats``.
The last table presented is *Register File statistics*. Each physical register The last table presented is *Register File statistics*. Each physical register
file (PRF) used by the pipeline is presented in this table. In the case of AMD file (PRF) used by the pipeline is presented in this table. In the case of AMD
Jaguar, there are two register files, one for floating-point registers Jaguar, there are two register files, one for floating-point registers (JFpuPRF)
(JFpuPRF) and one for integer registers (JIntegerPRF). The table shows that of and one for integer registers (JIntegerPRF). The table shows that of the 900
the 900 instructions processed, there were 900 mappings created. Since this instructions processed, there were 900 mappings created. Since this dot-product
dot-product example utilized only floating point registers, the JFPuPRF was example utilized only floating point registers, the JFPuPRF was responsible for
responsible for creating the 900 mappings. However, we see that the pipeline creating the 900 mappings. However, we see that the pipeline only used a
only used a maximum of 35 of 72 available register slots at any given time. We maximum of 35 of 72 available register slots at any given time. We can conclude
can conclude that the floating point PRF was the only register file used for that the floating point PRF was the only register file used for the example, and
the example, and that it was never resource constrained. The register file that it was never resource constrained. The register file statistics are
statistics are displayed by using the command option ``-all-stats`` or displayed by using the command option ``-all-stats`` or
``-register-file-stats``. ``-register-file-stats``.
In this example, we can conclude that the IPC is mostly limited by data In this example, we can conclude that the IPC is mostly limited by data
@ -572,8 +571,8 @@ dependencies, and not by resource pressure.
Instruction Flow Instruction Flow
^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^
This section describes the instruction flow through MCA's default out-of-order This section describes the instruction flow through the default pipeline of
pipeline, as well as the functional units involved in the process. :program:`llvm-mca`, as well as the functional units involved in the process.
The default pipeline implements the following sequence of stages used to The default pipeline implements the following sequence of stages used to
process instructions. process instructions.
@ -585,9 +584,9 @@ process instructions.
The default pipeline only models the out-of-order portion of a processor. The default pipeline only models the out-of-order portion of a processor.
Therefore, the instruction fetch and decode stages are not modeled. Performance Therefore, the instruction fetch and decode stages are not modeled. Performance
bottlenecks in the frontend are not diagnosed. MCA assumes that instructions bottlenecks in the frontend are not diagnosed. :program:`llvm-mca` assumes that
have all been decoded and placed into a queue. Also, MCA does not model branch instructions have all been decoded and placed into a queue before the simulation
prediction. start. Also, :program:`llvm-mca` does not model branch prediction.
Instruction Dispatch Instruction Dispatch
"""""""""""""""""""" """"""""""""""""""""
@ -607,19 +606,19 @@ An instruction can be dispatched if:
* The schedulers are not full. * The schedulers are not full.
Scheduling models can optionally specify which register files are available on Scheduling models can optionally specify which register files are available on
the processor. MCA uses that information to initialize register file the processor. :program:`llvm-mca` uses that information to initialize register
descriptors. Users can limit the number of physical registers that are file descriptors. Users can limit the number of physical registers that are
globally available for register renaming by using the command option globally available for register renaming by using the command option
``-register-file-size``. A value of zero for this option means *unbounded*. ``-register-file-size``. A value of zero for this option means *unbounded*. By
By knowing how many registers are available for renaming, MCA can predict knowing how many registers are available for renaming, the tool can predict
dispatch stalls caused by the lack of registers. dispatch stalls caused by the lack of physical registers.
The number of reorder buffer entries consumed by an instruction depends on the The number of reorder buffer entries consumed by an instruction depends on the
number of micro-opcodes specified by the target scheduling model. MCA's number of micro-opcodes specified for that instruction by the target scheduling
reorder buffer's purpose is to track the progress of instructions that are model. The reorder buffer is responsible for tracking the progress of
"in-flight," and to retire instructions in program order. The number of instructions that are "in-flight", and retiring them in program order. The
entries in the reorder buffer defaults to the `MicroOpBufferSize` provided by number of entries in the reorder buffer defaults to the value specified by field
the target scheduling model. `MicroOpBufferSize` in the target scheduling model.
Instructions that are dispatched to the schedulers consume scheduler buffer Instructions that are dispatched to the schedulers consume scheduler buffer
entries. :program:`llvm-mca` queries the scheduling model to determine the set entries. :program:`llvm-mca` queries the scheduling model to determine the set