1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-19 11:02:59 +02:00
Commit Graph

33 Commits

Author SHA1 Message Date
Andrea Di Biagio
9c3eda58cd [MCA] Zero-initialize field CRD in InstructionBase. Also run clang-format on a couple of files. NFC
llvm-svn: 361637
2019-05-24 13:56:01 +00:00
Andrea Di Biagio
d18dde10f1 [MCA] Add the ability to compute critical register dependency of an instruction.
This patch adds the methods `getCriticalRegDep()` and `computeCriticalRegDep()` to
class InstructionBase.
The goal is to allow users to obtain information about the critical register
dependency that most affects the latency of an instruction.

These methods are currently unused. However, the long term plan is to use them
in order to allow the computation of a critical-path as part of the bottleneck
analysis. So, this is yet another step towards fixing PR37494.

llvm-svn: 361509
2019-05-23 16:32:19 +00:00
Andrea Di Biagio
aba464655a [MCA] Introduce class LSUnitBase and let LSUnit derive from it.
Class LSUnitBase provides a abstract interface for all the concrete LS units in
llvm-mca.

Methods exposed by the public abstract LSUnitBase interface are:
 - Status isAvailable(const InstRef&);
 - void dispatch(const InstRef &);
 - const InstRef &isReady(const InstRef &);

LSUnitBase standardises the API, but not the data structures internally used by
LS units. This allows for more flexibility.
Previously, only method `isReady()` was declared virtual by class LSUnit.
Also, derived classes had to inherit all the internal data members of LSUnit.

No functional change intended.

llvm-svn: 361496
2019-05-23 13:42:47 +00:00
Andrea Di Biagio
c10551d172 [MCA] Make the bool conversion operator in class InstRef explicit. NFCI
This patch makes the bool conversion operator in InstRef explicit.
It also adds a operator< to hel comparing InstRef objects in sets.

llvm-svn: 361482
2019-05-23 10:50:01 +00:00
Andrea Di Biagio
c70f11adf6 [MCA] Notify event listeners when instructions transition to the Pending state. NFCI
llvm-svn: 359983
2019-05-05 16:07:27 +00:00
Andrea Di Biagio
16d23b6c1c [MCA] Add field IsEliminated to class Instruction. NFCI
llvm-svn: 359377
2019-04-27 11:59:11 +00:00
Andrea Di Biagio
873c7239ab [MCA] Add an experimental MicroOpQueue stage.
This patch adds an experimental stage named MicroOpQueueStage.
MicroOpQueueStage can be used to simulate a hardware micro-op queue (basically,
a decoupling queue between 'decode' and 'dispatch').  Users can specify a queue
size, as well as a optional MaxIPC (which - in the absence of a "Decoders" stage
- can be used to simulate a different throughput from the decoders).

This stage is added to the default pipeline between the EntryStage and the
DispatchStage only if PipelineOption::MicroOpQueue is different than zero. By
default, llvm-mca sets PipelineOption::MicroOpQueue to the value of hidden flag
-micro-op-queue-size.

Throughput from the decoder can be simulated via another hidden flag named
-decoder-throughput.  That flag allows us to quickly experiment with different
frontend throughputs.  For targets that declare a loop buffer, flag
-decoder-throughput allows users to do multiple runs, each time simulating a
different throughput from the decoders.

This stage can/will be extended in future. For example, we could add a "buffer
full" event to notify bottlenecks caused by backpressure. flag
-decoder-throughput would probably go away if in future we delegate to another
stage (DecoderStage?) the simulation of a (potentially variable) throughput from
the decoders. For now, flag -decoder-throughput is "good enough" to run some
simple experiments.

Differential Revision: https://reviews.llvm.org/D59928

llvm-svn: 357248
2019-03-29 12:15:37 +00:00
Andrea Di Biagio
a1cb49539d [MCA] Correctly update the UsedResourceGroups mask in the InstrBuilder.
Found by inspection when looking at the debug output of MCA.
This problem was latent, and none of the upstream models were affected by it.
No functional change intended.

llvm-svn: 357000
2019-03-26 15:38:37 +00:00
Andrea Di Biagio
0de499d982 [MCA] Remove unused methods. NFC
llvm-svn: 355314
2019-03-04 13:34:56 +00:00
Andrea Di Biagio
c5a150eca8 [MCA] Highlight kernel bottlenecks in the summary view.
This patch adds a new flag named -bottleneck-analysis to print out information
about throughput bottlenecks.

MCA knows how to identify and classify dynamic dispatch stalls. However, it
doesn't know how to analyze and highlight kernel bottlenecks.  The goal of this
patch is to teach MCA how to correlate increases in backend pressure to backend
stalls (and therefore, the loss of throughput).

From a Scheduler point of view, backend pressure is a function of the scheduler
buffer usage (i.e. how the number of uOps in the scheduler buffers changes over
time). Backend pressure increases (or decreases) when there is a mismatch
between the number of opcodes dispatched, and the number of opcodes issued in
the same cycle.  Since buffer resources are limited, continuous increases in
backend pressure would eventually leads to dispatch stalls. So, there is a
strong correlation between dispatch stalls, and how backpressure changed over
time.

This patch teaches how to identify situations where backend pressure increases
due to:
 - unavailable pipeline resources.
 - data dependencies.

Data dependencies may delay execution of instructions and therefore increase the
time that uOps have to spend in the scheduler buffers. That often translates to
an increase in backend pressure which may eventually lead to a bottleneck.
Contention on pipeline resources may also delay execution of instructions, and
lead to a temporary increase in backend pressure.

Internally, the Scheduler classifies instructions based on whether register /
memory operands are available or not.

An instruction is marked as "ready to execute" only if data dependencies are
fully resolved.
Every cycle, the Scheduler attempts to execute all instructions that are ready
to execute. If an instruction cannot execute because of unavailable pipeline
resources, then the Scheduler internally updates a BusyResourceUnits mask with
the ID of each unavailable resource.

ExecuteStage is responsible for tracking changes in backend pressure. If backend
pressure increases during a cycle because of contention on pipeline resources,
then ExecuteStage sends a "backend pressure" event to the listeners.
That event would contain information about instructions delayed by resource
pressure, as well as the BusyResourceUnits mask.

Note that ExecuteStage also knows how to identify situations where backpressure
increased because of delays introduced by data dependencies.

The SummaryView observes "backend pressure" events and prints out a "bottleneck
report".

Example of bottleneck report:

```
Cycles with backend pressure increase [ 99.89% ]
Throughput Bottlenecks:
  Resource Pressure       [ 0.00% ]
  Data Dependencies:      [ 99.89% ]
   - Register Dependencies [ 0.00% ]
   - Memory Dependencies   [ 99.89% ]
```

A bottleneck report is printed out only if increases in backend pressure
eventually caused backend stalls.

About the time complexity:

Time complexity is linear in the number of instructions in the
Scheduler::PendingSet.

The average slowdown tends to be in the range of ~5-6%.
For memory intensive kernels, the slowdown can be significant if flag
-noalias=false is specified. In the worst case scenario I have observed a
slowdown of ~30% when flag -noalias=false was specified.

We can definitely recover part of that slowdown if we optimize class LSUnit (by
doing extra bookkeeping to speedup queries). For now, this new analysis is
disabled by default, and it can be enabled via flag -bottleneck-analysis. Users
of MCA as a library can enable the generation of pressure events through the
constructor of ExecuteStage.

This patch partially addresses https://bugs.llvm.org/show_bug.cgi?id=37494

Differential Revision: https://reviews.llvm.org/D58728

llvm-svn: 355308
2019-03-04 11:52:34 +00:00
Andrea Di Biagio
70b9d172d7 [MCA] Always check if scheduler resources are unavailable when reporting dispatch stalls.
Dispatch stall cycles may be associated to multiple dispatch stall events.
Before this patch, each stall cycle was associated with a single stall event.
This patch also improves a couple of code comments, and adds a helper method to
query the Scheduler for dispatch stalls.

llvm-svn: 354877
2019-02-26 14:19:00 +00:00
Andrea Di Biagio
f1e15605d5 [MCA][Scheduler] Correctly initialize field NumDispatchedToThePendingSet.
This should have been part of r354490.

llvm-svn: 354493
2019-02-20 18:23:19 +00:00
Andrea Di Biagio
c8095b709a [MCA][Scheduler] Collect resource pressure and memory dependency bottlenecks.
Every cycle, the Scheduler checks if instructions in the ReadySet can be issued
to the underlying pipelines. If an instruction cannot be issued because one or
more pipeline resources are unavailable, then field
Instruction::CriticalResourceMask is updated with the resource identifier of the
unavailable resources.

If an instruction cannot be promoted from the PendingSet to the ReadySet because
of a memory dependency, then field Instruction::CriticalMemDep is updated with
the identifier of the dependending memory instruction.

Bottleneck information is collected after every cycle for instructions that are
waiting to execute. The idea is to help identify causes of bottlenecks; this
information can be used in future to implement a bottleneck analysis.

llvm-svn: 354490
2019-02-20 18:01:49 +00:00
Andrea Di Biagio
8cb6c4646d [MCA][ResourceManager] Add a table that maps processor resource indices to processor resource identifiers.
This patch adds a lookup table to speed up resource queries in the ResourceManager.
This patch also moves helper function 'getResourceStateIndex()' from
ResourceManager.cpp to Support.h, so that we can reuse that logic in the
SummaryView (and potentially other views in llvm-mca).
No functional change intended.

llvm-svn: 354470
2019-02-20 14:53:18 +00:00
Andrea Di Biagio
5df2126574 [MCA] Slightly refactor method writeStartEvent in WriteState and ReadState. NFCI
This is another change in preparation for PR37494.
No functional change intended.

llvm-svn: 354261
2019-02-18 11:27:11 +00:00
Andrea Di Biagio
665ec71dc0 [MCA] Improved code comment. NFC
llvm-svn: 354154
2019-02-15 18:28:11 +00:00
Andrea Di Biagio
5cafc02ac8 [MCA][LSUnit] Return the ID of the dependent memory operation from method
isReady(). NFCI

This is yet another change in preparation for a fix for PR37494.

llvm-svn: 354150
2019-02-15 18:05:59 +00:00
Andrea Di Biagio
dac58ab03c [MCA] Store a bitmask of used groups in the instruction descriptor.
This is to speedup 'checkAvailability' queries in class ResourceManager.
No functional change intended.

llvm-svn: 353949
2019-02-13 14:56:06 +00:00
Andrea Di Biagio
d467ca94d2 [MCA][Scheduler] Use latency information to further classify busy instructions.
This patch introduces a new instruction stage named 'IS_PENDING'.
An instruction transitions from the IS_DISPATCHED to the IS_PENDING stage if
input registers are not available, but their latency is known.

This patch also adds a new set of instructions named 'PendingSet' to class
Scheduler. The idea is that the PendingSet will only contain instructions that
have reached the IS_PENDING stage.
By construction, an instruction in the PendingSet is only dependent on
instructions that have already reached the execution stage. The plan is to use
this knowledge to identify bottlenecks caused by data dependencies (see
PR37494).

Differential Revision: https://reviews.llvm.org/D58066

llvm-svn: 353937
2019-02-13 11:02:42 +00:00
Andrea Di Biagio
acb3fbc2c0 [MCA] Improved debug prints. NFC
llvm-svn: 353852
2019-02-12 16:18:57 +00:00
Andrea Di Biagio
7a9fad3edf [MCA][Scheduler] Track resources that were found busy when issuing an instruction.
This is a follow up of r353706. When the scheduler fails to issue a ready
instruction to the underlying pipelines, it now updates a mask of 'busy resource
units'. That information will be used in future to obtain the set of
"problematic" resources in the case of bottlenecks caused by resource pressure.
No functional change intended.

llvm-svn: 353728
2019-02-11 17:55:47 +00:00
Andrea Di Biagio
2775e932aa [MCA] Return a mask of busy resources from method ResourceManager::checkAvailability(). NFCI
In case of bottlenecks caused by pipeline pressure, we want to be able to
correctly report the set of problematic pipelines. This is a first step towards
adding support for bottleneck hints in llvm-mca (see PR37494). No functional
change intended.

llvm-svn: 353706
2019-02-11 14:53:04 +00:00
Andrea Di Biagio
3e8820df08 [MCA] Moved the logic that updates register dependencies from DispatchStage to RegisterFile. NFC
DispatchStage should always delegate to an object of class RegisterFile the task
of updating data dependencies.  ReadState and WriteState objects should not be
modified directly by DispatchStage.
This patch also renames stage IS_AVAILABLE to IS_DISPATCHED.

llvm-svn: 353170
2019-02-05 14:11:41 +00:00
Andrea Di Biagio
5d3783c0d0 [MC][X86] Correctly model additional operand latency caused by transfer delays from the integer to the floating point unit.
This patch adds a new ReadAdvance definition named ReadInt2Fpu.
ReadInt2Fpu allows x86 scheduling models to accurately describe delays caused by
data transfers from the integer unit to the floating point unit.
ReadInt2Fpu currently defaults to a delay of zero cycles (i.e. no delay) for all
x86 models excluding BtVer2. That means, this patch is only a functional change
for the Jaguar cpu model only.

Tablegen definitions for instructions (V)PINSR* have been updated to account for
the new ReadInt2Fpu. That read is mapped to the the GPR input operand.
On Jaguar, int-to-fpu transfers are modeled as a +6cy delay. Before this patch,
that extra delay was added to the opcode latency. In practice, the insert opcode
only executes for 1cy. Most of the actual latency is actually contributed by the
so-called operand-latency. According to the AMD SOG for family 16h, (V)PINSR*
latency is defined by expression f+1, where f is defined as a forwarding delay
from the integer unit to the fpu.

When printing instruction latency from MCA (see InstructionInfoView.cpp) and LLC
(only when flag -print-schedule is speified), we now need to account for any
extra forwarding delays. We do this by checking if scheduling classes declare
any negative ReadAdvance entries. Quoting a code comment in TargetSchedule.td:
"A negative advance effectively increases latency, which may be used for
cross-domain stalls". When computing the instruction latency for the purpose of
our scheduling tests, we now add any extra delay to the formula. This avoids
regressing existing codegen and mca schedule tests. It comes with the cost of an
extra (but very simple) hook in MCSchedModel.

Differential Revision: https://reviews.llvm.org/D57056

llvm-svn: 351965
2019-01-23 16:35:07 +00:00
Chandler Carruth
ae65e281f3 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Andrea Di Biagio
f4d1c262c1 [MCA] Fix wrong definition of ResourceUnitMask in DefaultResourceStrategy.
Field ResourceUnitMask was incorrectly defined as a 'const unsigned' mask. It
should have been a 64 bit quantity instead. That means, ResourceUnitMask was
always implicitly truncated to a 32 bit quantity.
This issue has been found by inspection. Surprisingly, that bug was latent, and
it never negatively affected any existing upstream targets.

This patch fixes  the wrong definition of ResourceUnitMask, and adds a bunch of
extra debug prints to help debugging potential issues related to invalid
processor resource masks.

llvm-svn: 350820
2019-01-10 13:59:13 +00:00
Evandro Menezes
ec2dad0049 [llvm-mca] Improve debugging (NFC)
llvm-svn: 350661
2019-01-08 22:29:38 +00:00
Andrea Di Biagio
2475907f4b [MCA] Improved handling of in-order issue/dispatch resources.
Added field 'MustIssueImmediately' to the instruction descriptor of instructions
that only consume in-order issue/dispatch processor resources.
This speeds up queries from the hardware Scheduler, and gives an average ~5%
speedup on a release build.

No functional change intended.

llvm-svn: 350397
2019-01-04 15:08:38 +00:00
Andrea Di Biagio
3b8b6c1505 [MCA] Store extra information about processor resources in the ResourceManager.
Method ResourceManager::use() is responsible for updating the internal state of
used processor resources, as well as notifying resource groups that contain used
resources.

Before this patch, method 'use()' didn't know how to quickly obtain the set of
groups that contain a particular resource unit. It had to discover groups by
perform a potentially slow search (done by iterating over the set of processor
resource descriptors).

With this patch, the relationship between resource units and groups is stored in
the ResourceManager. That means, method 'use()' no longer has to search for
groups. This gives an average speedup of ~4-5% on a release build.

This patch also adds extra code comments in ResourceManager.h to better describe
the resource mask layout, and how resouce indices are computed from resource
masks.

llvm-svn: 350387
2019-01-04 12:31:14 +00:00
Andrea Di Biagio
79e68d2ba7 [MCA] Improve code comment and reuse an helper function in ResourceManager. NFCI
llvm-svn: 350322
2019-01-03 14:47:46 +00:00
Andrea Di Biagio
431c9f6d7c [MCA] Add support for BeginGroup/EndGroup.
llvm-svn: 349354
2018-12-17 14:27:33 +00:00
Andrea Di Biagio
e6ecfd3ed6 [MCA] Don't assume that createMCInstrAnalysis() always returns a valid pointer.
Class InstrBuilder wrongly assumed that llvm targets were always able to return
a non-null pointer when createMCInstrAnalysis() was called on them.
This was causing crashes when simulating executions for targets that don't
provide an MCInstrAnalysis object.
This patch fixes the issue by making MCInstrAnalysis optional.

llvm-svn: 349352
2018-12-17 14:00:37 +00:00
Clement Courbet
9093bbf39e [llvm-mca] Move llvm-mca library to llvm/lib/MCA.
Summary: See PR38731.

Reviewers: andreadb

Subscribers: mgorny, javed.absar, tschuett, gbedwell, andreadb, RKSimon, llvm-commits

Differential Revision: https://reviews.llvm.org/D55557

llvm-svn: 349332
2018-12-17 08:08:31 +00:00