llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-20 19:42:54 +02:00

Author	SHA1	Message	Date
Matt Davis	08d8849d2e	[llvm-mca] Move ResourceManager from Scheduler into its own file. NFC. This time I should be preserving history of the ResourceManager changes. llvm-svn: 340668	2018-08-24 22:59:13 +00:00
Matt Davis	e8a2060a78	[llvm-mca] Revert r340659. NFC. Choosing to revert the change and do it again, hopefully preserving the history of the changes by using svn copy instead of simply creating a new file from the contents within Scheduler. llvm-svn: 340661	2018-08-24 22:05:14 +00:00
Matt Davis	b6de49258b	[llvm-mca] Move the ResourceManger from the Scheduler into its own file. NFC. llvm-svn: 340659	2018-08-24 21:53:12 +00:00
Matt Davis	1e4dfd0b90	[llvm-mca] Move views and stats into a Views subdir. NFC. llvm-svn: 340645	2018-08-24 20:24:53 +00:00
Walter Lee	4105a127d9	[llvm-mca] Fix parameter name. NFC. llvm-svn: 340570	2018-08-23 20:17:42 +00:00
Matt Davis	06fdbc5048	[llvm-mca] Set the Selection strategy to Default if nullptr is passed. * Set (not reset) the strategy in Scheduler::setCustomStrategyImpl() llvm-svn: 340566	2018-08-23 18:42:37 +00:00
Andrea Di Biagio	a9095b3f03	[llvm-mca] Fix wrong call to setCustomStrategy(). Thanks to @waltl for reporting this issue. I have also added an assert to check for invalid null strategy objects, and I have reworded a couple of code comments in Scheduler.h llvm-svn: 340545	2018-08-23 17:09:08 +00:00
Andrea Di Biagio	51215f85a0	[llvm-mca] Allow the definition of custom strategies for selecting processor resource units. With this patch, users can now customize the pipeline selection strategy for scheduler resources. The resource selection strategy can be defined at processor resource granularity. This enables the definition of different strategies for different hardware schedulers. To override the strategy associated with a processor resource, users can call method ResourceManager::setCustomStrategy(), and pass a 'ResourceStrategy' object in input. Class ResourceStrategy is an abstract class which declares virtual method `ResourceStrategy::select()`. Method select() is meant to implement the actual strategy; it is responsible for picking the next best resource from a set of available pipeline resources. Custom strategy must simply override that method. By default, processor resources are associated with instances of 'DefaultResourceStrategy'. A 'DefaultResourceStrategy' internally implements a simple round-robin selector. For more details, please refer to the code comments in Scheduler.h. llvm-svn: 340536	2018-08-23 15:04:52 +00:00
Matt Davis	bba94ca910	[llvm-mca] Clean up a comment about the Context class. NFC. llvm-svn: 340431	2018-08-22 18:03:58 +00:00
Matt Davis	1e183f4e6a	[llvm-mca] Remove unused decl. NFC. llvm-svn: 340422	2018-08-22 17:15:25 +00:00
Andrea Di Biagio	208015540d	[llvm-mca] Improved code comments and moved some method definitions from Scheduler.h to Scheduler.cpp. NFC llvm-svn: 340395	2018-08-22 10:23:28 +00:00
Matt Davis	787d98fd2a	[llvm-mca] Remove unused decl. NFC. llvm-svn: 340316	2018-08-21 18:39:20 +00:00
Andrea Di Biagio	3225498e2b	[llvm-mca] Add the ability to customize the instruction selection strategy in the Scheduler. The constructor of Scheduler now accepts a SchedulerStrategy object, which is used internally by method Scheduler::select() to drive the instruction selection process. The goal of this patch is to enable the definition of custom selection strategies while reusing the same algorithms implemented by class Scheduler. The motivation is that, on some targets, the default strategy may not well approximate the selection logic in the hardware schedulers. This patch also adds the ability to pass a ResourceManager object to the constructor of Scheduler. This gives a bit more flexibility to the design, and potentially it allows to expose processor resources to SchedulerStrategy objects. Differential Revision: https://reviews.llvm.org/D51051 llvm-svn: 340314	2018-08-21 18:20:16 +00:00
Andrea Di Biagio	1ef6559797	[llvm-mca] Replace use of llvm::any_of with std::any_of. This should unbreak the buildbots. llvm-svn: 340274	2018-08-21 13:00:44 +00:00
Andrea Di Biagio	ab0b923dab	[llvm-mca] Add method cycleEvent() to class Scheduler. NFCI The goal of this patch is to simplify the Scheduler's interface in preparation for D50929. Some methods in the Scheduler's interface should not be exposed to external users, since their presence makes it hard to both understand, and extend the Scheduler's interface. This patch removes the following two methods from the public Scheduler's API: - reclaimSimulatedResources() - updatePendingQueue() Their logic has been migrated to a new method named 'cycleEvent()'. Methods 'updateIssuedSet()' and 'promoteToReadySet()' still exist. However, they are now private members of class Scheduler. This simplifies the interaction with the Scheduler from the ExecuteStage. llvm-svn: 340273	2018-08-21 12:40:15 +00:00
Matt Davis	3ace459e8d	[llvm-mca] Remove unused formal parameter. NFC. llvm-svn: 340227	2018-08-20 22:41:27 +00:00
Andrea Di Biagio	67969d60cf	[llvm-mca] Make the LSUnit a HardwareUnit, and allow derived classes to implement a different memory consistency model. The LSUnit is now a HardwareUnit, and it is owned by the mca::Context. Derived classes can now implement a different consistency model by overriding method `LSUnit::isReady()`. This patch also slightly refactors the Scheduler interface in the attempt to simplifying the interaction between ExecuteStage and the underlying Scheduler. llvm-svn: 340176	2018-08-20 14:41:36 +00:00
Matt Davis	4818ffc307	[llvm-mca] Reformat a few lines (fix spacing). NFC. llvm-svn: 340065	2018-08-17 18:06:01 +00:00
Andrea Di Biagio	fb9304872a	[llvm-mca] Removed references to HWStallEvent in Scheduler.h. NFCI class Scheduler should not know anything of hardware event listeners and hardware stall events (HWStallEvent). HWStallEvent objects should only be constructed by pipeline stages to notify listeners of hardware events. No functional change intended. llvm-svn: 340036	2018-08-17 15:01:37 +00:00
Andrea Di Biagio	1dc15644b9	[llvm-mca] Fix -Wpessimizing-move warnings introduced by r339923. Reported by buildbot `clang-with-lto-ubuntu` ( build #9858 ). llvm-svn: 339928	2018-08-16 19:45:13 +00:00
Andrea Di Biagio	081e1ff89b	[llvm-mca] Refactor how execution is orchestrated by the Pipeline. This patch changes how instruction execution is orchestrated by the Pipeline. In particular, this patch makes it more explicit how instructions transition through the various pipeline stages during execution. The main goal is to simplify both the stage API and the Pipeline execution. At the same time, this patch fixes some design issues which are currently latent, but that are likely to cause problems in future if people start defining custom pipelines. The new design assumes that each pipeline stage knows the "next-in-sequence". The Stage API has gained three new methods: - isAvailable(IR) - checkNextStage(IR) - moveToTheNextStage(IR). An instruction IR can be executed by a Stage if method `Stage::isAvailable(IR)` returns true. Instructions can move to next stages using method moveToTheNextStage(IR). An instruction cannot be moved to the next stage if method checkNextStage(IR) (called on the current stage) returns false. Stages are now responsible for moving instructions to the next stage in sequence if necessary. Instructions are allowed to transition through multiple stages during a single cycle (as long as stages are available, and as long as all the calls to `checkNextStage(IR)` returns true). Methods `Stage::preExecute()` and `Stage::postExecute()` have now become redundant, and those are removed by this patch. Method Pipeline::runCycle() is now simpler, and it correctly visits stages on every begin/end of cycle. Other changes: - DispatchStage no longer requires a reference to the Scheduler. - ExecuteStage no longer needs to directly interact with the RetireControlUnit. Instead, executed instructions are now directly moved to the next stage (i.e. the retire stage). - RetireStage gained an execute method. This allowed us to remove the dependency with the RCU in ExecuteStage. - FecthStage now updates the "program counter" during cycleBegin() (i.e. before we start executing new instructions). - We no longer need Stage::Status to be returned by method execute(). It has been dropped in favor of a more lightweight llvm::Error. Overally, I measured a ~11% performance gain w.r.t. the previous design. I also think that the Stage interface is probably easier to read now. That being said, code comments have to be improved, and I plan to do it in a follow-up patch. Differential revision: https://reviews.llvm.org/D50849 llvm-svn: 339923	2018-08-16 19:00:48 +00:00
Andrea Di Biagio	da0d5626a8	[llvm-mca] Small refactoring in preparation for another patch that will improve the modularity of the Pipeline. NFCI The main difference is that now `cycleStart()` and `cycleEnd()` return an llvm::Error. This patch implements a few minor style changes, and adds missing 'const' to some methods. llvm-svn: 339885	2018-08-16 15:43:09 +00:00
Andrea Di Biagio	10a086e001	[llvm-mca] Minor style changes. NFC llvm-svn: 339823	2018-08-15 22:11:05 +00:00
Andrea Di Biagio	128c334e6d	[llvm-mca] Fix PR38575: Avoid an invalid implicit truncation of a processor resource mask (an uint64_t value) to unsigned. This patch fixes a regression introduced at revision 338702. A processor resource mask was incorrectly implicitly truncated to an unsigned quantity. Later on, the truncated mask was used to initialize an element of a vector of processor resource descriptors. On targets with more than 32 processor resources, some elements of the vector are left uninitialized. As a consequence, this bug might have eventually caused a crash due to null dereference in the Scheduler. This patch fixes PR38575, and adds a test for it. llvm-svn: 339768	2018-08-15 12:53:38 +00:00
Matt Davis	8f7fca4891	[llvm-mca] Propagate fatal llvm-mca errors from library classes to driver. Summary: This patch introduces error handling to propagate the errors from llvm-mca library classes (or what will become library classes) up to the driver. This patch also introduces an enum to make clearer the intention of the return value for Stage::execute. This supports PR38101. Reviewers: andreadb, courbet, RKSimon Reviewed By: andreadb Subscribers: llvm-commits, tschuett, gbedwell Differential Revision: https://reviews.llvm.org/D50561 llvm-svn: 339594	2018-08-13 18:11:48 +00:00
Matt Davis	ef7fb7a970	[llvm-mca] Make InstrBuilder::getOrCreateInstrDesc private. NFC. llvm-svn: 339468	2018-08-10 20:24:27 +00:00
Petr Hosek	a98aa0866c	[ADT] Normalize empty triple components LLVM triple normalization is handling "unknown" and empty components differently; for example given "x86_64-unknown-linux-gnu" and "x86_64-linux-gnu" which should be equivalent, triple normalization returns "x86_64-unknown-linux-gnu" and "x86_64--linux-gnu". autoconf's config.sub returns "x86_64-unknown-linux-gnu" for both "x86_64-linux-gnu" and "x86_64-unknown-linux-gnu". This changes the triple normalization to behave the same way, replacing empty triple components with "unknown". This addresses PR37129. Differential Revision: https://reviews.llvm.org/D50219 llvm-svn: 339294	2018-08-08 22:23:57 +00:00
Andrea Di Biagio	3d390604c7	[llvm-mca] Speed up the computation of the wait/ready/issued sets in the Scheduler. This patch is a follow-up to r338702. We don't need to use a map to model the wait/ready/issued sets. It is much more efficient to use a vector instead. This patch gives us an average 7.5% speedup (on top of the ~12% speedup obtained after r338702). llvm-svn: 338883	2018-08-03 12:55:28 +00:00
Andrea Di Biagio	8b13141f92	[llvm-mca] Use a vector to store ResourceState objects in the ResourceManager. We don't need to use a map to store ResourceState objects. The number of processor resources is known statically from the scheduling model. We can therefore use a vector, and reserve a slot for each processor resource that we want to simulate. Every time the ResourceManager queries the ResourceState vector, the index to the vector of ResourceState objects can be easily computed from the processor resource mask. This drastically reduces the time complexity of method ResourceManager::use() and method ResourceManager::release(). This patch gives an average speedup of 12%. llvm-svn: 338702	2018-08-02 11:12:35 +00:00
Andrea Di Biagio	692173433e	[llvm-mca] Correctly update the rank in `Scheduler::select()`. Found by inspection. llvm-svn: 338579	2018-08-01 16:06:33 +00:00
Andrea Di Biagio	e75426e0b8	[llvm-mca] Improve code comments. NFC. llvm-svn: 338513	2018-08-01 10:49:01 +00:00
Matt Davis	d0dbf0dd79	[llvm-mca] Update the help text to reflect "physical" registers. NFC. llvm-svn: 338430	2018-07-31 20:05:08 +00:00
Andrea Di Biagio	ec4329b38b	[llvm-mca] Remove README.txt A detailed description of the tool has been recently added by Matt to CommandGuide/llvm-mca.rst. File README.txt is now redundant and can be removed; all the relevant user-guide information has been improved and then moved to llvm-mca.rst. In future, we should add another .rst for the "llvm-mca developer manual" to provide infromation about: - llvm-mca internals. - How to add custom stages to the simulated pipeline. - How to provide extra processor info in the scheduling model to improve the analysis performed by llvm-mca. llvm-svn: 338386	2018-07-31 14:23:49 +00:00
Andrea Di Biagio	0e53532aeb	[llvm-mca][BtVer2] Teach how to identify dependency-breaking idioms. This patch teaches llvm-mca how to identify dependency breaking instructions on btver2. An example of dependency breaking instructions is the zero-idiom XOR (example: `XOR %eax, %eax`), which always generates zero regardless of the actual value of the input register operands. Dependency breaking instructions don't have to wait on their input register operands before executing. This is because the computation is not dependent on the inputs. Not all dependency breaking idioms are also zero-latency instructions. For example, `CMPEQ %xmm1, %xmm1` is independent on the value of XMM1, and it generates a vector of all-ones. That instruction is not eliminated at register renaming stage, and its opcode is issued to a pipeline for execution. So, the latency is not zero. This patch adds a new method named isDependencyBreaking() to the MCInstrAnalysis interface. That method takes as input an instruction (i.e. MCInst) and a MCSubtargetInfo. The default implementation of isDependencyBreaking() conservatively returns false for all instructions. Targets may override the default behavior for specific CPUs, and return a value which better matches the subtarget behavior. In future, we should teach to Tablegen how to automatically generate the body of isDependencyBreaking from scheduling predicate definitions. This would allow us to expose the knowledge about dependency breaking instructions to the machine schedulers (and, potentially, other codegen passes). Differential Revision: https://reviews.llvm.org/D49310 llvm-svn: 338372	2018-07-31 13:21:43 +00:00
Dean Michael Berris	79adceb6c6	[MCA] Avoid an InstrDesc copy in mca::LSUnit::reserve. Summary: InstrDesc contains 4 vectors (as well as some other data), so it's expensive to copy. Authored By: orodley Reviewers: andreadb, mattd, dberris Reviewed By: mattd, dberris Subscribers: dberris, gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D49775 llvm-svn: 337985	2018-07-26 00:02:54 +00:00
Andrea Di Biagio	c19db3b1d5	[llvm-mca][BtVer2] teach how to identify false dependencies on partially written registers. The goal of this patch is to improve the throughput analysis in llvm-mca for the case where instructions perform partial register writes. On x86, partial register writes are quite difficult to model, mainly because different processors tend to implement different register merging schemes in hardware. When the code contains partial register writes, the IPC (instructions per cycles) estimated by llvm-mca tends to diverge quite significantly from the observed IPC (using perf). Modern AMD processors (at least, from Bulldozer onwards) don't rename partial registers. Quoting Agner Fog's microarchitecture.pdf: " The processor always keeps the different parts of an integer register together. For example, AL and AH are not treated as independent by the out-of-order execution mechanism. An instruction that writes to part of a register will therefore have a false dependence on any previous write to the same register or any part of it." This patch is a first important step towards improving the analysis of partial register updates. It changes the semantic of RegisterFile descriptors in tablegen, and teaches llvm-mca how to identify false dependences in the presence of partial register writes (for more details: see the new code comments in include/Target/TargetSchedule.h - class RegisterFile). This patch doesn't address the case where a write to a part of a register is followed by a read from the whole register. On Intel chips, high8 registers (AH/BH/CH/DH)) can be stored in separate physical registers. However, a later (dirty) read of the full register (example: AX/EAX) triggers a merge uOp, which adds extra latency (and potentially affects the pipe usage). This is a very interesting article on the subject with a very informative answer from Peter Cordes: https://stackoverflow.com/questions/45660139/how-exactly-do-partial-registers-on-haswell-skylake-perform-writing-al-seems-to In future, the definition of RegisterFile can be extended with extra information that may be used to identify delays caused by merge opcodes triggered by a dirty read of a partial write. Differential Revision: https://reviews.llvm.org/D49196 llvm-svn: 337123	2018-07-15 11:01:38 +00:00
Matt Davis	7599dd7f50	[llvm-mca] Turn InstructionTables into a Stage. Summary: This patch converts the InstructionTables class into a subclass of mca::Stage. This change allows us to use the Stage's inherited Listeners for event notifications. This also allows us to create a simple pipeline for viewing the InstructionTables report. I have been working on a follow on patch that should cleanup addView in InstructionTables. Right now, addView adds the view to both the Listener list and Views list. The follow-on patch addresses the fact that we don't really need two lists in this case. That change is not specific to just InstructionTables, so it will be a separate patch. Reviewers: andreadb, courbet, RKSimon Reviewed By: andreadb Subscribers: tschuett, gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D49329 llvm-svn: 337113	2018-07-14 23:52:50 +00:00
Matt Davis	4391ce3837	[llvm-mca] Remove unused InstRef formal from pre and post execute callbacks. NFC. llvm-svn: 337077	2018-07-14 00:10:42 +00:00
Andrea Di Biagio	642e2e539c	[llvm-mca] Improve a few debug prints. NFC llvm-svn: 337003	2018-07-13 14:55:47 +00:00
Andrea Di Biagio	2c65d72a0f	[llvm-mca] Simplify the Pipeline constructor. NFC llvm-svn: 336984	2018-07-13 09:31:02 +00:00
Andrea Di Biagio	3af21f0841	[llvm-mca] Removed unused arguments from methods in class Pipeline. NFC llvm-svn: 336983	2018-07-13 09:27:34 +00:00
Matt Davis	bcf4db053e	[llvm-mca] Constify SourceMgr::hasNext. NFC. llvm-svn: 336961	2018-07-12 23:19:30 +00:00
Matt Davis	3de6f07d48	[llvm-mca] Add cycleBegin/cycleEnd callbacks to mca::Stage. Summary: This patch clears up some of the semantics within the Stage class. Now, preExecute can be called multiple times per simulated cycle. Previously preExecute was only called once per cycle, and postExecute could have been called multiple times. Now, cycleStart/cycleEnd are called only once per simulated cycle. preExecute/postExecute can be called multiple times per cycle. This occurs because multiple execution events can occur during a single cycle. When stages are executed (Pipeline::runCycle), the postExecute hook will be called only if all Stages return a success from their 'execute' callback. Reviewers: andreadb, courbet, RKSimon Reviewed By: andreadb Subscribers: tschuett, gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D49250 llvm-svn: 336959	2018-07-12 22:59:53 +00:00
Matt Davis	9919114951	[llvm-mca] Simplify eventing by adding an onEvent templated method. Summary: This patch eliminates some redundancy in iterating across Listeners for the Instruction and Stall HWEvents, by introducing a template onEvent routine. This change was suggested by @courbet in https://reviews.llvm.org/D48576. I hope that this patch addresses that suggestion appropriately. I do like this change better than what we had previously. Reviewers: andreadb, courbet, RKSimon Reviewed By: andreadb, courbet Subscribers: javed.absar, tschuett, gbedwell, llvm-commits, courbet Differential Revision: https://reviews.llvm.org/D48672 llvm-svn: 336916	2018-07-12 16:56:17 +00:00
Andrea Di Biagio	e2a4194fea	[llvm-mca] Use a different character to flag instructions with side-effects in the Instruction Info View. NFC This makes easier to identify changes in the instruction info flags. It also helps spotting potential regressions similar to the one recently introduced at r336728. Using the same character to mark MayLoad/MayStore/HasSideEffects is problematic for llvm-lit. When pattern matching substrings, llvm-lit consumes tabs and spaces. A change in position of the flag marker may not trigger a test failure. This patch only changes the character used for flag `hasSideEffects`. The reason why I didn't touch other flags is because I want to avoid spamming the mailing because of the massive diff due to the numerous tests affected by this change. In future, each instruction flag should be associated with a different character in the Instruction Info View. llvm-svn: 336797	2018-07-11 12:44:44 +00:00
Andrea Di Biagio	aaaf1382f8	[llvm-mca] report an error if the assembly sequence contains an unsupported instruction. This is a short-term fix for PR38093. For now, we llvm::report_fatal_error if the instruction builder finds an unsupported instruction in the instruction stream. We need to revisit this fix once we start addressing PR38101. Essentially, we need a better framework for error handling. llvm-svn: 336543	2018-07-09 12:30:55 +00:00
Matt Davis	7464e6c913	[llvm-mca] Add HardwareUnit and Context classes. This patch moves the construction of the default backend from llvm-mca.cpp and into mca::Context. The Context class is responsible for holding ownership of the simulated hardware components. These components are subclasses of HardwareUnit. Right now the HardwareUnit is pretty bare-bones, but eventually we might want to add some common functionality across all hardware components, such as isReady() or something similar. I have a feeling this patch will probably need some updates, but it's a start. One thing I am not particularly fond of is the rather large interface for createDefaultPipeline. That convenience routine takes a rather large set of inputs from the llvm-mca driver, where many of those inputs are generated via command line options. One item I think we might want to change is the separating of ownership of hardware components (owned by the context) and the pipeline (which owns Stages). In short, a Pipeline owns Stages, a Context (currently) owns hardware. The Pipeline's Stages make use of the components, and thus there is a lifetime dependency generated. The components must outlive the pipeline. We could solve this by having the Context also own the Pipeline, and not return a unique_ptr<Pipeline>. Now that I think about it, I like that idea more. Differential Revision: https://reviews.llvm.org/D48691 llvm-svn: 336456	2018-07-06 18:03:14 +00:00
Andrea Di Biagio	f4b2508e93	[llvm-mca] A write latency cannot be a negative value. NFC llvm-svn: 336437	2018-07-06 13:46:10 +00:00
Andrea Di Biagio	46e908a592	[llvm-mca] improve the instruction issue logic implemented by the Scheduler. This patch modifies the Scheduler heuristic used to select the next instruction to issue to the pipelines. The motivating example is test X86/BtVer2/add-sequence.s, for which llvm-mca wrongly reported an estimated IPC of 1.50. According to perf, the actual IPC for that test should have been ~2.00. It turns out that an IPC of 2.00 for test add-sequence.s cannot possibly be predicted by a Scheduler that only prioritizes instructions based on their "age". A similar issue also affected test X86/BtVer2/dependent-pmuld-paddd.s, for which llvm-mca wrongly estimated an IPC of 0.84 instead of an IPC of 1.00. Instructions in the ReadyQueue are now ranked based on two factors: - The "age" of an instruction. - The number of unique users of writes associated with an instruction. The new logic still prioritizes older instructions over younger instructions to minimize the pressure on the reorder buffer. However, the number of users of an instruction now also affects the overall rank. This potentially increases the ability of the Scheduler to extract instruction level parallelism. This patch fixes the problem with the wrong IPC reported for test add-sequence.s and test dependent-pmuld-paddd.s. llvm-svn: 336420	2018-07-06 08:08:30 +00:00
Andrea Di Biagio	c51f4304c7	[llvm-mca] Fix RegisterFile debug prints. NFC llvm-svn: 336367	2018-07-05 16:13:49 +00:00

1 2 3 4

199 Commits