mirror of
https://github.com/RPCS3/llvm-mirror.git
synced 2024-11-23 19:23:23 +01:00
[llvm-mca][docs] Improve the CommandLine documentation.
This patch replaces all the remaining occurrences of string "MCA" with ":program:`llvm-mca`". Somehow I missed those strings when I committed r338394. This patch also improves section "Instruction Dispatch". llvm-svn: 338881
This commit is contained in:
parent
e27c5b613c
commit
1aca2c2e82
@ -454,8 +454,8 @@ The ``-all-stats`` command line option enables extra statistics and performance
|
||||
counters for the dispatch logic, the reorder buffer, the retire control unit,
|
||||
and the register file.
|
||||
|
||||
Below is an example of ``-all-stats`` output generated by MCA for the
|
||||
dot-product example discussed in the previous sections.
|
||||
Below is an example of ``-all-stats`` output generated by :program:`llvm-mca`
|
||||
for the dot-product example discussed in the previous sections.
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
@ -514,17 +514,16 @@ SCHEDQ reports 272 cycles. This counter is incremented every time the dispatch
|
||||
logic is unable to dispatch a group of two instructions because the scheduler's
|
||||
queue is full.
|
||||
|
||||
Looking at the *Dispatch Logic* table, we see that the pipeline was only able
|
||||
to dispatch two instructions 51.5% of the time. The dispatch group was limited
|
||||
to one instruction 44.6% of the cycles, which corresponds to 272 cycles. The
|
||||
Looking at the *Dispatch Logic* table, we see that the pipeline was only able to
|
||||
dispatch two instructions 51.5% of the time. The dispatch group was limited to
|
||||
one instruction 44.6% of the cycles, which corresponds to 272 cycles. The
|
||||
dispatch statistics are displayed by either using the command option
|
||||
``-all-stats`` or ``-dispatch-stats``.
|
||||
|
||||
The next table, *Schedulers*, presents a histogram displaying a count,
|
||||
representing the number of instructions issued on some number of cycles. In
|
||||
this case, of the 610 simulated cycles, single
|
||||
instructions were issued 306 times (50.2%) and there were 7 cycles where
|
||||
no instructions were issued.
|
||||
this case, of the 610 simulated cycles, single instructions were issued 306
|
||||
times (50.2%) and there were 7 cycles where no instructions were issued.
|
||||
|
||||
The *Scheduler's queue usage* table shows that the maximum number of buffer
|
||||
entries (i.e., scheduler queue entries) used at runtime. Resource JFPU01
|
||||
@ -543,28 +542,28 @@ A full scheduler queue is either caused by data dependency chains or by a
|
||||
sub-optimal usage of hardware resources. Sometimes, resource pressure can be
|
||||
mitigated by rewriting the kernel using different instructions that consume
|
||||
different scheduler resources. Schedulers with a small queue are less resilient
|
||||
to bottlenecks caused by the presence of long data dependencies.
|
||||
The scheduler statistics are displayed by
|
||||
using the command option ``-all-stats`` or ``-scheduler-stats``.
|
||||
to bottlenecks caused by the presence of long data dependencies. The scheduler
|
||||
statistics are displayed by using the command option ``-all-stats`` or
|
||||
``-scheduler-stats``.
|
||||
|
||||
The next table, *Retire Control Unit*, presents a histogram displaying a count,
|
||||
representing the number of instructions retired on some number of cycles. In
|
||||
this case, of the 610 simulated cycles, two instructions were retired during
|
||||
the same cycle 399 times (65.4%) and there were 109 cycles where no
|
||||
instructions were retired. The retire statistics are displayed by using the
|
||||
command option ``-all-stats`` or ``-retire-stats``.
|
||||
this case, of the 610 simulated cycles, two instructions were retired during the
|
||||
same cycle 399 times (65.4%) and there were 109 cycles where no instructions
|
||||
were retired. The retire statistics are displayed by using the command option
|
||||
``-all-stats`` or ``-retire-stats``.
|
||||
|
||||
The last table presented is *Register File statistics*. Each physical register
|
||||
file (PRF) used by the pipeline is presented in this table. In the case of AMD
|
||||
Jaguar, there are two register files, one for floating-point registers
|
||||
(JFpuPRF) and one for integer registers (JIntegerPRF). The table shows that of
|
||||
the 900 instructions processed, there were 900 mappings created. Since this
|
||||
dot-product example utilized only floating point registers, the JFPuPRF was
|
||||
responsible for creating the 900 mappings. However, we see that the pipeline
|
||||
only used a maximum of 35 of 72 available register slots at any given time. We
|
||||
can conclude that the floating point PRF was the only register file used for
|
||||
the example, and that it was never resource constrained. The register file
|
||||
statistics are displayed by using the command option ``-all-stats`` or
|
||||
Jaguar, there are two register files, one for floating-point registers (JFpuPRF)
|
||||
and one for integer registers (JIntegerPRF). The table shows that of the 900
|
||||
instructions processed, there were 900 mappings created. Since this dot-product
|
||||
example utilized only floating point registers, the JFPuPRF was responsible for
|
||||
creating the 900 mappings. However, we see that the pipeline only used a
|
||||
maximum of 35 of 72 available register slots at any given time. We can conclude
|
||||
that the floating point PRF was the only register file used for the example, and
|
||||
that it was never resource constrained. The register file statistics are
|
||||
displayed by using the command option ``-all-stats`` or
|
||||
``-register-file-stats``.
|
||||
|
||||
In this example, we can conclude that the IPC is mostly limited by data
|
||||
@ -572,8 +571,8 @@ dependencies, and not by resource pressure.
|
||||
|
||||
Instruction Flow
|
||||
^^^^^^^^^^^^^^^^
|
||||
This section describes the instruction flow through MCA's default out-of-order
|
||||
pipeline, as well as the functional units involved in the process.
|
||||
This section describes the instruction flow through the default pipeline of
|
||||
:program:`llvm-mca`, as well as the functional units involved in the process.
|
||||
|
||||
The default pipeline implements the following sequence of stages used to
|
||||
process instructions.
|
||||
@ -585,9 +584,9 @@ process instructions.
|
||||
|
||||
The default pipeline only models the out-of-order portion of a processor.
|
||||
Therefore, the instruction fetch and decode stages are not modeled. Performance
|
||||
bottlenecks in the frontend are not diagnosed. MCA assumes that instructions
|
||||
have all been decoded and placed into a queue. Also, MCA does not model branch
|
||||
prediction.
|
||||
bottlenecks in the frontend are not diagnosed. :program:`llvm-mca` assumes that
|
||||
instructions have all been decoded and placed into a queue before the simulation
|
||||
start. Also, :program:`llvm-mca` does not model branch prediction.
|
||||
|
||||
Instruction Dispatch
|
||||
""""""""""""""""""""
|
||||
@ -607,19 +606,19 @@ An instruction can be dispatched if:
|
||||
* The schedulers are not full.
|
||||
|
||||
Scheduling models can optionally specify which register files are available on
|
||||
the processor. MCA uses that information to initialize register file
|
||||
descriptors. Users can limit the number of physical registers that are
|
||||
the processor. :program:`llvm-mca` uses that information to initialize register
|
||||
file descriptors. Users can limit the number of physical registers that are
|
||||
globally available for register renaming by using the command option
|
||||
``-register-file-size``. A value of zero for this option means *unbounded*.
|
||||
By knowing how many registers are available for renaming, MCA can predict
|
||||
dispatch stalls caused by the lack of registers.
|
||||
``-register-file-size``. A value of zero for this option means *unbounded*. By
|
||||
knowing how many registers are available for renaming, the tool can predict
|
||||
dispatch stalls caused by the lack of physical registers.
|
||||
|
||||
The number of reorder buffer entries consumed by an instruction depends on the
|
||||
number of micro-opcodes specified by the target scheduling model. MCA's
|
||||
reorder buffer's purpose is to track the progress of instructions that are
|
||||
"in-flight," and to retire instructions in program order. The number of
|
||||
entries in the reorder buffer defaults to the `MicroOpBufferSize` provided by
|
||||
the target scheduling model.
|
||||
number of micro-opcodes specified for that instruction by the target scheduling
|
||||
model. The reorder buffer is responsible for tracking the progress of
|
||||
instructions that are "in-flight", and retiring them in program order. The
|
||||
number of entries in the reorder buffer defaults to the value specified by field
|
||||
`MicroOpBufferSize` in the target scheduling model.
|
||||
|
||||
Instructions that are dispatched to the schedulers consume scheduler buffer
|
||||
entries. :program:`llvm-mca` queries the scheduling model to determine the set
|
||||
|
Loading…
Reference in New Issue
Block a user