mirror of
https://github.com/RPCS3/llvm-mirror.git
synced 2024-11-23 19:23:23 +01:00
[llvm-mca][docs] Improve the CommandLine documentation.
This patch replaces all the remaining occurrences of string "MCA" with ":program:`llvm-mca`". Somehow I missed those strings when I committed r338394. This patch also improves section "Instruction Dispatch". llvm-svn: 338881
This commit is contained in:
parent
e27c5b613c
commit
1aca2c2e82
@ -454,8 +454,8 @@ The ``-all-stats`` command line option enables extra statistics and performance
|
|||||||
counters for the dispatch logic, the reorder buffer, the retire control unit,
|
counters for the dispatch logic, the reorder buffer, the retire control unit,
|
||||||
and the register file.
|
and the register file.
|
||||||
|
|
||||||
Below is an example of ``-all-stats`` output generated by MCA for the
|
Below is an example of ``-all-stats`` output generated by :program:`llvm-mca`
|
||||||
dot-product example discussed in the previous sections.
|
for the dot-product example discussed in the previous sections.
|
||||||
|
|
||||||
.. code-block:: none
|
.. code-block:: none
|
||||||
|
|
||||||
@ -514,17 +514,16 @@ SCHEDQ reports 272 cycles. This counter is incremented every time the dispatch
|
|||||||
logic is unable to dispatch a group of two instructions because the scheduler's
|
logic is unable to dispatch a group of two instructions because the scheduler's
|
||||||
queue is full.
|
queue is full.
|
||||||
|
|
||||||
Looking at the *Dispatch Logic* table, we see that the pipeline was only able
|
Looking at the *Dispatch Logic* table, we see that the pipeline was only able to
|
||||||
to dispatch two instructions 51.5% of the time. The dispatch group was limited
|
dispatch two instructions 51.5% of the time. The dispatch group was limited to
|
||||||
to one instruction 44.6% of the cycles, which corresponds to 272 cycles. The
|
one instruction 44.6% of the cycles, which corresponds to 272 cycles. The
|
||||||
dispatch statistics are displayed by either using the command option
|
dispatch statistics are displayed by either using the command option
|
||||||
``-all-stats`` or ``-dispatch-stats``.
|
``-all-stats`` or ``-dispatch-stats``.
|
||||||
|
|
||||||
The next table, *Schedulers*, presents a histogram displaying a count,
|
The next table, *Schedulers*, presents a histogram displaying a count,
|
||||||
representing the number of instructions issued on some number of cycles. In
|
representing the number of instructions issued on some number of cycles. In
|
||||||
this case, of the 610 simulated cycles, single
|
this case, of the 610 simulated cycles, single instructions were issued 306
|
||||||
instructions were issued 306 times (50.2%) and there were 7 cycles where
|
times (50.2%) and there were 7 cycles where no instructions were issued.
|
||||||
no instructions were issued.
|
|
||||||
|
|
||||||
The *Scheduler's queue usage* table shows that the maximum number of buffer
|
The *Scheduler's queue usage* table shows that the maximum number of buffer
|
||||||
entries (i.e., scheduler queue entries) used at runtime. Resource JFPU01
|
entries (i.e., scheduler queue entries) used at runtime. Resource JFPU01
|
||||||
@ -543,28 +542,28 @@ A full scheduler queue is either caused by data dependency chains or by a
|
|||||||
sub-optimal usage of hardware resources. Sometimes, resource pressure can be
|
sub-optimal usage of hardware resources. Sometimes, resource pressure can be
|
||||||
mitigated by rewriting the kernel using different instructions that consume
|
mitigated by rewriting the kernel using different instructions that consume
|
||||||
different scheduler resources. Schedulers with a small queue are less resilient
|
different scheduler resources. Schedulers with a small queue are less resilient
|
||||||
to bottlenecks caused by the presence of long data dependencies.
|
to bottlenecks caused by the presence of long data dependencies. The scheduler
|
||||||
The scheduler statistics are displayed by
|
statistics are displayed by using the command option ``-all-stats`` or
|
||||||
using the command option ``-all-stats`` or ``-scheduler-stats``.
|
``-scheduler-stats``.
|
||||||
|
|
||||||
The next table, *Retire Control Unit*, presents a histogram displaying a count,
|
The next table, *Retire Control Unit*, presents a histogram displaying a count,
|
||||||
representing the number of instructions retired on some number of cycles. In
|
representing the number of instructions retired on some number of cycles. In
|
||||||
this case, of the 610 simulated cycles, two instructions were retired during
|
this case, of the 610 simulated cycles, two instructions were retired during the
|
||||||
the same cycle 399 times (65.4%) and there were 109 cycles where no
|
same cycle 399 times (65.4%) and there were 109 cycles where no instructions
|
||||||
instructions were retired. The retire statistics are displayed by using the
|
were retired. The retire statistics are displayed by using the command option
|
||||||
command option ``-all-stats`` or ``-retire-stats``.
|
``-all-stats`` or ``-retire-stats``.
|
||||||
|
|
||||||
The last table presented is *Register File statistics*. Each physical register
|
The last table presented is *Register File statistics*. Each physical register
|
||||||
file (PRF) used by the pipeline is presented in this table. In the case of AMD
|
file (PRF) used by the pipeline is presented in this table. In the case of AMD
|
||||||
Jaguar, there are two register files, one for floating-point registers
|
Jaguar, there are two register files, one for floating-point registers (JFpuPRF)
|
||||||
(JFpuPRF) and one for integer registers (JIntegerPRF). The table shows that of
|
and one for integer registers (JIntegerPRF). The table shows that of the 900
|
||||||
the 900 instructions processed, there were 900 mappings created. Since this
|
instructions processed, there were 900 mappings created. Since this dot-product
|
||||||
dot-product example utilized only floating point registers, the JFPuPRF was
|
example utilized only floating point registers, the JFPuPRF was responsible for
|
||||||
responsible for creating the 900 mappings. However, we see that the pipeline
|
creating the 900 mappings. However, we see that the pipeline only used a
|
||||||
only used a maximum of 35 of 72 available register slots at any given time. We
|
maximum of 35 of 72 available register slots at any given time. We can conclude
|
||||||
can conclude that the floating point PRF was the only register file used for
|
that the floating point PRF was the only register file used for the example, and
|
||||||
the example, and that it was never resource constrained. The register file
|
that it was never resource constrained. The register file statistics are
|
||||||
statistics are displayed by using the command option ``-all-stats`` or
|
displayed by using the command option ``-all-stats`` or
|
||||||
``-register-file-stats``.
|
``-register-file-stats``.
|
||||||
|
|
||||||
In this example, we can conclude that the IPC is mostly limited by data
|
In this example, we can conclude that the IPC is mostly limited by data
|
||||||
@ -572,8 +571,8 @@ dependencies, and not by resource pressure.
|
|||||||
|
|
||||||
Instruction Flow
|
Instruction Flow
|
||||||
^^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^^
|
||||||
This section describes the instruction flow through MCA's default out-of-order
|
This section describes the instruction flow through the default pipeline of
|
||||||
pipeline, as well as the functional units involved in the process.
|
:program:`llvm-mca`, as well as the functional units involved in the process.
|
||||||
|
|
||||||
The default pipeline implements the following sequence of stages used to
|
The default pipeline implements the following sequence of stages used to
|
||||||
process instructions.
|
process instructions.
|
||||||
@ -585,9 +584,9 @@ process instructions.
|
|||||||
|
|
||||||
The default pipeline only models the out-of-order portion of a processor.
|
The default pipeline only models the out-of-order portion of a processor.
|
||||||
Therefore, the instruction fetch and decode stages are not modeled. Performance
|
Therefore, the instruction fetch and decode stages are not modeled. Performance
|
||||||
bottlenecks in the frontend are not diagnosed. MCA assumes that instructions
|
bottlenecks in the frontend are not diagnosed. :program:`llvm-mca` assumes that
|
||||||
have all been decoded and placed into a queue. Also, MCA does not model branch
|
instructions have all been decoded and placed into a queue before the simulation
|
||||||
prediction.
|
start. Also, :program:`llvm-mca` does not model branch prediction.
|
||||||
|
|
||||||
Instruction Dispatch
|
Instruction Dispatch
|
||||||
""""""""""""""""""""
|
""""""""""""""""""""
|
||||||
@ -607,19 +606,19 @@ An instruction can be dispatched if:
|
|||||||
* The schedulers are not full.
|
* The schedulers are not full.
|
||||||
|
|
||||||
Scheduling models can optionally specify which register files are available on
|
Scheduling models can optionally specify which register files are available on
|
||||||
the processor. MCA uses that information to initialize register file
|
the processor. :program:`llvm-mca` uses that information to initialize register
|
||||||
descriptors. Users can limit the number of physical registers that are
|
file descriptors. Users can limit the number of physical registers that are
|
||||||
globally available for register renaming by using the command option
|
globally available for register renaming by using the command option
|
||||||
``-register-file-size``. A value of zero for this option means *unbounded*.
|
``-register-file-size``. A value of zero for this option means *unbounded*. By
|
||||||
By knowing how many registers are available for renaming, MCA can predict
|
knowing how many registers are available for renaming, the tool can predict
|
||||||
dispatch stalls caused by the lack of registers.
|
dispatch stalls caused by the lack of physical registers.
|
||||||
|
|
||||||
The number of reorder buffer entries consumed by an instruction depends on the
|
The number of reorder buffer entries consumed by an instruction depends on the
|
||||||
number of micro-opcodes specified by the target scheduling model. MCA's
|
number of micro-opcodes specified for that instruction by the target scheduling
|
||||||
reorder buffer's purpose is to track the progress of instructions that are
|
model. The reorder buffer is responsible for tracking the progress of
|
||||||
"in-flight," and to retire instructions in program order. The number of
|
instructions that are "in-flight", and retiring them in program order. The
|
||||||
entries in the reorder buffer defaults to the `MicroOpBufferSize` provided by
|
number of entries in the reorder buffer defaults to the value specified by field
|
||||||
the target scheduling model.
|
`MicroOpBufferSize` in the target scheduling model.
|
||||||
|
|
||||||
Instructions that are dispatched to the schedulers consume scheduler buffer
|
Instructions that are dispatched to the schedulers consume scheduler buffer
|
||||||
entries. :program:`llvm-mca` queries the scheduling model to determine the set
|
entries. :program:`llvm-mca` queries the scheduling model to determine the set
|
||||||
|
Loading…
Reference in New Issue
Block a user