llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-02-01 13:11:39 +01:00

Author	SHA1	Message	Date
Andrea Di Biagio	98fc61c0dc	[X86] Remove wrong ReadAdvance from multiclass sse_fp_unop_s. A ReadAdvance was incorrectly added to the SchedReadWrite list associated with the following SSE instructions: sqrtss sqrtsd rsqrtss rcpss As a consequence, a wrong operand latency was computed for the register operand used as the base address of the folded load operand. This patch removes the wrong ReadAdvance, and updates the llvm-mca test cases. There is still a problem with correctly modeling partial register writes on XMM registers This other problem is currently tracked here: https://bugs.llvm.org/show_bug.cgi?id=38813 Differential Revision: https://reviews.llvm.org/D51542 llvm-svn: 341326	2018-09-03 16:47:34 +00:00
Andrea Di Biagio	24700e66e8	[X86][BtVer2] Remove wrong ReadAdvance from AVX vbroadcast(ss\|sd\|f128) instructions. The presence of a ReadAdvance for input operand #0 is problematic because it changes the input latency of the register used as the base address for the folded load. A broadcast cannot start executing if the load address hasn't been computed yet. In the llvm-mca example, the VBROADCASTSS is dependent on the address generated by the LEAQ. That means, it cannot start until LEAQ reaches the write-back stage. If we apply ReadAdvance, then we wrongly assume that the load can start 3 cycles in advance. Differential Revision: https://reviews.llvm.org/D51534 llvm-svn: 341222	2018-08-31 16:05:48 +00:00
Andrea Di Biagio	bfbd45c5ba	[X86] Add llvm-mca tests that show how operand latency is wrongly computed for SSE sqrtss/sd and rcpss. According to the timeline view, sqrtss/sd/rcpss start executing before the load address for the memory operand is available. This problem is caused by the presence of a ReadAfterLd (a ReadAdvance). Those unary operations should not specify a ReadAdvance at all. llvm-svn: 341213	2018-08-31 14:12:13 +00:00
Andrea Di Biagio	5c0fd2b844	[X86][BtVer2] Add an llvm-mca test that shows how the read latency of AVX broadcastss on ymm registers is incorrectly set. llvm-svn: 341197	2018-08-31 10:39:33 +00:00
Andrea Di Biagio	b85f875aab	[X86][BtVer2] Fix WriteFShuffle256 schedule write info. This patch fixes the number of micro opcodes, and processor resource cycles for the following AVX instructions: vinsertf128rr/rm vperm2f128rr/rm vbroadcastf128 Tests have been regenerated using the usual scripts in the llvm/utils directory. Differential Revision: https://reviews.llvm.org/D51492 llvm-svn: 341185	2018-08-31 08:30:47 +00:00
Andrea Di Biagio	7e5d9331c7	[llvm-mca] Report the number of dispatched micro opcodes in the DispatchStatistics view. This patch introduces the following changes to the DispatchStatistics view: * DispatchStatistics now reports the number of dispatched opcodes instead of the number of dispatched instructions. * The "Dynamic Dispatch Stall Cycles" table now also reports the percentage of stall cycles against the total simulated cycles. This change allows users to easily compare dispatch group sizes with the processor DispatchWidth. Before this change, it was difficult to correlate the two numbers, since DispatchStatistics view reported numbers of instructions (instead of opcodes). DispatchWidth defines the maximum size of a dispatch group in terms of number of micro opcodes. The other change introduced by this patch is related to how DispatchStage generates "instruction dispatch" events. In particular: * There can be multiple dispatch events associated with a same instruction * Each dispatch event now encapsulates the number of dispatched micro opcodes. The number of micro opcodes declared by an instruction may exceed the processor DispatchWidth. Therefore, we cannot assume that instructions are always fully dispatched in a single cycle. DispatchStage knows already how to handle instructions declaring a number of opcodes bigger that DispatchWidth. However, DispatchStage always emitted a single instruction dispatch event (during the first simulated dispatch cycle) for instructions dispatched. With this patch, DispatchStage now correctly notifies multiple dispatch events for instructions that cannot be dispatched in a single cycle. A few views had to be modified. Views can no longer assume that there can only be one dispatch event per instruction. Tests (and docs) have been updated. Differential Revision: https://reviews.llvm.org/D51430 llvm-svn: 341055	2018-08-30 10:50:20 +00:00
Andrew V. Tischenko	dff9d04945	[X86] Improved sched model for X86 CMPXCHG* instructions. Differential Revision: https://reviews.llvm.org/D50070 llvm-svn: 341024	2018-08-30 06:26:00 +00:00
Andrea Di Biagio	80b01d0203	[llvm-mca] Add fields "Total uOps" and "uOps Per Cycle" to the report generated by the SummaryView. This patch adds two new fields to the perf report generated by the SummaryView. Fields are now logically organized into two small groups; only the second group contains throughput indicators. Example: ``` Iterations: 100 Instructions: 300 Total Cycles: 414 Total uOps: 700 Dispatch Width: 4 uOps Per Cycle: 1.69 IPC: 0.72 Block RThroughput: 4.0 ``` This patch also updates the docs for llvm-mca. Due to the nature of this change, several tests in the tools/llvm-mca directory were affected, and had to be updated using script `update_mca_test_checks.py`. llvm-svn: 340946	2018-08-29 17:56:39 +00:00
Andrea Di Biagio	989c373718	[llvm-mca] Don't disable the SummaryView if flag `-all-stats` is false. llvm-svn: 340945	2018-08-29 17:40:04 +00:00
Andrea Di Biagio	159300cdaf	[llvm-mca][TimelineView] Force the same number of executions for every entry in the 'wait-times' table. This patch also uses colors to highlight problematic wait-time entries. A problematic entry is an entry with an high wait time that tends to match (or exceed) the size of the scheduler's buffer. Color RED is used if an instruction had to wait an average number of cycles which is bigger than (or equal to) the size of the underlying scheduler's buffer. Color YELLOW is used if the time (in cycles) spend waiting for the operands or pipeline resources is bigger than half the size of the underlying scheduler's buffer. Color MAGENTA is used if an instruction does not consume buffer resources according to the scheduling model. llvm-svn: 340825	2018-08-28 14:27:01 +00:00
Andrea Di Biagio	f707cd4166	[llvm-mca] Improved report generated by the SchedulerStatistics view. Before this patch, the SchedulerStatistics only printed the maximum number of buffer entries consumed in each scheduler's queue at a given point of the simulation. This patch restructures the reported table, and adds an extra field named "Average number of used buffer entries" to it. This patch also uses different colors to help identifying bottlenecks caused by high scheduler's buffer pressure. llvm-svn: 340746	2018-08-27 14:52:52 +00:00
Andrea Di Biagio	128c334e6d	[llvm-mca] Fix PR38575: Avoid an invalid implicit truncation of a processor resource mask (an uint64_t value) to unsigned. This patch fixes a regression introduced at revision 338702. A processor resource mask was incorrectly implicitly truncated to an unsigned quantity. Later on, the truncated mask was used to initialize an element of a vector of processor resource descriptors. On targets with more than 32 processor resources, some elements of the vector are left uninitialized. As a consequence, this bug might have eventually caused a crash due to null dereference in the Scheduler. This patch fixes PR38575, and adds a test for it. llvm-svn: 339768	2018-08-15 12:53:38 +00:00
Andrew V. Tischenko	8da7fefc11	[X86] MCA tests for XCHG, XADD and CMPXCHG* instructions Differential Revision: https://reviews.llvm.org/D49912 llvm-svn: 339145	2018-08-07 14:36:43 +00:00
Simon Pilgrim	5661768583	[llvm-mca][x86] Add CMPXCHG instruction resource tests I've put CMPXCHG8B/CMPXCHG16B in the same file, even though technically they are under separate CPUID bits all targets seem to support both (or neither). llvm-svn: 338595	2018-08-01 17:25:11 +00:00
Simon Pilgrim	65f4cb4b44	[llvm-mca][x86] Add PREFETCHW instruction resource tests These aren't just available via 3DNow! so test for them separately as well. llvm-svn: 338584	2018-08-01 16:34:39 +00:00
Simon Pilgrim	1a89eb29f1	[llvm-mca][x86] Add PCLMUL instruction resource tests Renamed the btver2 file that already contained them - the other targets were only testing the AVX versions llvm-svn: 338583	2018-08-01 16:25:50 +00:00
Andrea Di Biagio	692173433e	[llvm-mca] Correctly update the rank in `Scheduler::select()`. Found by inspection. llvm-svn: 338579	2018-08-01 16:06:33 +00:00
Simon Pilgrim	296b5490bb	[llvm-mca][x86] Add SET/TEST instruction resource tests llvm-svn: 338576	2018-08-01 15:29:47 +00:00
Simon Pilgrim	1ede106fb5	[llvm-mca][x86] Add LEA instruction resource tests We already added these to btver2, now add them to other targets, even though none of their models treat them specially (yet). llvm-svn: 338565	2018-08-01 14:25:33 +00:00
Simon Pilgrim	e24e0bb1e0	[llvm-mca][x86] Add more x86-64 system instruction resource tests CPUID, IN/OUT, INS/OUTS, INT, PAUSE, SCAS, UD2, XLAT llvm-svn: 338563	2018-08-01 14:18:09 +00:00
Simon Pilgrim	f81c102500	[llvm-mca][x86] Add CLFLUSHOPT instruction resource tests llvm-svn: 338550	2018-08-01 13:34:17 +00:00
Simon Pilgrim	1f819b9834	[llvm-mca][x86] Add CMPS/LODS/MOVS/STOS string instruction resource tests llvm-svn: 338532	2018-08-01 13:14:45 +00:00
Simon Pilgrim	bdd08107f3	[llvm-mca][x86] Add STC + STD instruction resource tests llvm-svn: 338514	2018-08-01 11:00:11 +00:00
Simon Pilgrim	676985b151	[llvm-mca][x86] Add 32-bit instruction resource tests These aren't exhaustive, but cover some instructions that are only available in 32-bit mode (where would we be without good BCD math performance?). llvm-svn: 338404	2018-07-31 17:33:08 +00:00
Andrea Di Biagio	0e53532aeb	[llvm-mca][BtVer2] Teach how to identify dependency-breaking idioms. This patch teaches llvm-mca how to identify dependency breaking instructions on btver2. An example of dependency breaking instructions is the zero-idiom XOR (example: `XOR %eax, %eax`), which always generates zero regardless of the actual value of the input register operands. Dependency breaking instructions don't have to wait on their input register operands before executing. This is because the computation is not dependent on the inputs. Not all dependency breaking idioms are also zero-latency instructions. For example, `CMPEQ %xmm1, %xmm1` is independent on the value of XMM1, and it generates a vector of all-ones. That instruction is not eliminated at register renaming stage, and its opcode is issued to a pipeline for execution. So, the latency is not zero. This patch adds a new method named isDependencyBreaking() to the MCInstrAnalysis interface. That method takes as input an instruction (i.e. MCInst) and a MCSubtargetInfo. The default implementation of isDependencyBreaking() conservatively returns false for all instructions. Targets may override the default behavior for specific CPUs, and return a value which better matches the subtarget behavior. In future, we should teach to Tablegen how to automatically generate the body of isDependencyBreaking from scheduling predicate definitions. This would allow us to expose the knowledge about dependency breaking instructions to the machine schedulers (and, potentially, other codegen passes). Differential Revision: https://reviews.llvm.org/D49310 llvm-svn: 338372	2018-07-31 13:21:43 +00:00
Roman Lebedev	a322744eae	[NFC][MCA] ZnVer1: Update RegisterFile to identify false dependencies on partially written registers. Summary: Pretty mechanical follow-up for D49196. As microarchitecture.pdf notes, "20 AMD Ryzen pipeline", "20.8 Register renaming and out-of-order schedulers": The integer register file has 168 physical registers of 64 bits each. The floating point register file has 160 registers of 128 bits each. "20.14 Partial register access": The processor always keeps the different parts of an integer register together. ... An instruction that writes to part of a register will therefore have a false dependence on any previous write to the same register or any part of it. Reviewers: andreadb, courbet, RKSimon, craig.topper, GGanesh Reviewed By: GGanesh Subscribers: gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D49393 llvm-svn: 337676	2018-07-23 10:10:13 +00:00
Roman Lebedev	c103239ef6	[NFC][MCA] ZnVer1: add partial-reg-update tests Reviewers: andreadb, courbet, RKSimon, craig.topper, GGanesh Reviewed By: GGanesh Subscribers: gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D49392 llvm-svn: 337675	2018-07-23 10:10:04 +00:00
Simon Pilgrim	ab78f6876b	[llvm-mca][x86] Add movsx/movzx instructions to general x86_64 resource tests llvm-svn: 337586	2018-07-20 17:43:42 +00:00
Andrea Di Biagio	0792e8ab30	[X86][BtVer2] correctly model the latency/throughput of LEA instructions. This patch fixes the latency/throughput of LEA instructions in the BtVer2 scheduling model. On Jaguar, A 3-operands LEA has a latency of 2cy, and a reciprocal throughput of 1. That is because it uses one cycle of SAGU followed by 1cy of ALU1. An LEA with a "Scale" operand is also slow, and it has the same latency profile as the 3-operands LEA. An LEA16r has a latency of 3cy, and a throughput of 0.5 (i.e. RThrouhgput of 2.0). This patch adds a new TIIPredicate named IsThreeOperandsLEAFn to X86Schedule.td. The tablegen backend (for instruction-info) expands that definition into this (file X86GenInstrInfo.inc): ``` static bool isThreeOperandsLEA(const MachineInstr &MI) { return ( ( MI.getOpcode() == X86::LEA32r \|\| MI.getOpcode() == X86::LEA64r \|\| MI.getOpcode() == X86::LEA64_32r \|\| MI.getOpcode() == X86::LEA16r ) && MI.getOperand(1).isReg() && MI.getOperand(1).getReg() != 0 && MI.getOperand(3).isReg() && MI.getOperand(3).getReg() != 0 && ( ( MI.getOperand(4).isImm() && MI.getOperand(4).getImm() != 0 ) \|\| (MI.getOperand(4).isGlobal()) ) ); } ``` A similar method is generated in the X86_MC namespace, and included into X86MCTargetDesc.cpp (the declaration lives in X86MCTargetDesc.h). Back to the BtVer2 scheduling model: A new scheduling predicate named JSlowLEAPredicate now checks if either the instruction is a three-operands LEA, or it is an LEA with a Scale value different than 1. A variant scheduling class uses that new predicate to correctly select the appropriate latency profile. Differential Revision: https://reviews.llvm.org/D49436 llvm-svn: 337469	2018-07-19 16:42:15 +00:00
Simon Pilgrim	7b9d1cc945	[llvm-mca][x86] Add extend, carry-flag and CMP instructions to general x86_64 resource tests llvm-svn: 337306	2018-07-17 17:47:35 +00:00
Simon Pilgrim	d7c8b9d075	[llvm-mca][x86] Add MOVBE resource tests to all supporting targets SNB doesn't support MOVBE but the numbers in Generic (which use the SNB model) look sane. llvm-svn: 337305	2018-07-17 17:41:45 +00:00
Simon Pilgrim	46a3f67161	[llvm-mca][x86] Add BSWAP resource tests llvm-svn: 337302	2018-07-17 17:10:47 +00:00
Simon Pilgrim	7bb344125a	[llvm-mca][x86] Add displacement-only and additional scale=1 LEA tests llvm-svn: 337298	2018-07-17 16:17:33 +00:00
Simon Pilgrim	6a22898a11	[llvm-mca][x86] Add LEA resource tests (PR32326) Add llvm-mca tests demonstrating how LEA instructions are currently modelled. Once this is working on btver2 I'll copy the test file to the other target directories. llvm-svn: 337297	2018-07-17 16:13:29 +00:00
Andrea Di Biagio	b2b164d670	[llvm-mca] Regenerate X86 specific tests. NFC Not all tests were correctly updated by the update script after r336797. llvm-svn: 337124	2018-07-15 11:43:11 +00:00
Andrea Di Biagio	c19db3b1d5	[llvm-mca][BtVer2] teach how to identify false dependencies on partially written registers. The goal of this patch is to improve the throughput analysis in llvm-mca for the case where instructions perform partial register writes. On x86, partial register writes are quite difficult to model, mainly because different processors tend to implement different register merging schemes in hardware. When the code contains partial register writes, the IPC (instructions per cycles) estimated by llvm-mca tends to diverge quite significantly from the observed IPC (using perf). Modern AMD processors (at least, from Bulldozer onwards) don't rename partial registers. Quoting Agner Fog's microarchitecture.pdf: " The processor always keeps the different parts of an integer register together. For example, AL and AH are not treated as independent by the out-of-order execution mechanism. An instruction that writes to part of a register will therefore have a false dependence on any previous write to the same register or any part of it." This patch is a first important step towards improving the analysis of partial register updates. It changes the semantic of RegisterFile descriptors in tablegen, and teaches llvm-mca how to identify false dependences in the presence of partial register writes (for more details: see the new code comments in include/Target/TargetSchedule.h - class RegisterFile). This patch doesn't address the case where a write to a part of a register is followed by a read from the whole register. On Intel chips, high8 registers (AH/BH/CH/DH)) can be stored in separate physical registers. However, a later (dirty) read of the full register (example: AX/EAX) triggers a merge uOp, which adds extra latency (and potentially affects the pipe usage). This is a very interesting article on the subject with a very informative answer from Peter Cordes: https://stackoverflow.com/questions/45660139/how-exactly-do-partial-registers-on-haswell-skylake-perform-writing-al-seems-to In future, the definition of RegisterFile can be extended with extra information that may be used to identify delays caused by merge opcodes triggered by a dirty read of a partial write. Differential Revision: https://reviews.llvm.org/D49196 llvm-svn: 337123	2018-07-15 11:01:38 +00:00
Andrea Di Biagio	b054f8cd4b	[llvm-mca][BtVer2] Add tests for dependency breaking instructions. llvm-svn: 337024	2018-07-13 16:46:51 +00:00
Andrea Di Biagio	6b6463f348	[X86] Fix MayLoad/HasSideEffect flag for (V)MOVLPSrm instructions. Before revision 336728, the "mayLoad" flag for instruction (V)MOVLPSrm was inferred directly from the "default" pattern associated with the instruction definition. r336728 removed special node X86Movlps, and all the patterns associated to it. Now instruction (V)MOVLPSrm doesn't have a pattern associated to it, and the 'mayLoad/hasSideEffects' flags are left unset. When the instruction info is emitted by tablegen, method CodeGenDAGPatterns::InferInstructionFlags() sees that (V)MOVLPSrm doesn't have a pattern, and flags are undefined. So, it conservatively sets the "hasSideEffects" flag for it. As a consequence, we were losing the 'mayLoad' flag, and we were gaining a 'hasSideEffect' flag in its place. This patch fixes the issue (originally reported by Michael Holmen). The mca tests show the differences in the instruction info flags. Instructions that were affected by this problem were: MOVLPSrm/VMOVLPSrm/VMOVLPSZ128rm. Differential Revision: https://reviews.llvm.org/D49182 llvm-svn: 336818	2018-07-11 15:27:50 +00:00
Andrea Di Biagio	e2a4194fea	[llvm-mca] Use a different character to flag instructions with side-effects in the Instruction Info View. NFC This makes easier to identify changes in the instruction info flags. It also helps spotting potential regressions similar to the one recently introduced at r336728. Using the same character to mark MayLoad/MayStore/HasSideEffects is problematic for llvm-lit. When pattern matching substrings, llvm-lit consumes tabs and spaces. A change in position of the flag marker may not trigger a test failure. This patch only changes the character used for flag `hasSideEffects`. The reason why I didn't touch other flags is because I want to avoid spamming the mailing because of the massive diff due to the numerous tests affected by this change. In future, each instruction flag should be associated with a different character in the Instruction Info View. llvm-svn: 336797	2018-07-11 12:44:44 +00:00
Andrea Di Biagio	d8954c65cb	[llvm-mca] Add tests for partial register writes. llvm-mca doesn't know that on modern AMD processors, portions of a general purpose register are not treated independently. So, a partial register write has a false dependency on the super-register. The issue with partial register writes will be addressed by a follow-up patch. llvm-svn: 336778	2018-07-11 09:50:00 +00:00
Andrea Di Biagio	aaaf1382f8	[llvm-mca] report an error if the assembly sequence contains an unsupported instruction. This is a short-term fix for PR38093. For now, we llvm::report_fatal_error if the instruction builder finds an unsupported instruction in the instruction stream. We need to revisit this fix once we start addressing PR38101. Essentially, we need a better framework for error handling. llvm-svn: 336543	2018-07-09 12:30:55 +00:00
Roman Lebedev	f199fb4bf4	[MCA][X86][NFC] Add BSF/BSR resource tests Reviewers: RKSimon, andreadb, courbet Reviewed By: RKSimon Subscribers: gbedwell, llvm-commits Differential Revision: https://reviews.llvm.org/D48997 llvm-svn: 336510	2018-07-08 09:50:14 +00:00
Andrea Di Biagio	46e908a592	[llvm-mca] improve the instruction issue logic implemented by the Scheduler. This patch modifies the Scheduler heuristic used to select the next instruction to issue to the pipelines. The motivating example is test X86/BtVer2/add-sequence.s, for which llvm-mca wrongly reported an estimated IPC of 1.50. According to perf, the actual IPC for that test should have been ~2.00. It turns out that an IPC of 2.00 for test add-sequence.s cannot possibly be predicted by a Scheduler that only prioritizes instructions based on their "age". A similar issue also affected test X86/BtVer2/dependent-pmuld-paddd.s, for which llvm-mca wrongly estimated an IPC of 0.84 instead of an IPC of 1.00. Instructions in the ReadyQueue are now ranked based on two factors: - The "age" of an instruction. - The number of unique users of writes associated with an instruction. The new logic still prioritizes older instructions over younger instructions to minimize the pressure on the reorder buffer. However, the number of users of an instruction now also affects the overall rank. This potentially increases the ability of the Scheduler to extract instruction level parallelism. This patch fixes the problem with the wrong IPC reported for test add-sequence.s and test dependent-pmuld-paddd.s. llvm-svn: 336420	2018-07-06 08:08:30 +00:00
Roman Lebedev	213b1bad80	[X86][BtVer2][MCA][NFC] Add CMPEQ dependency-breaking one-idioms tests Summary: As per `Agner's Microarchitecture doc (21.8 AMD Bobcat and Jaguar pipeline - Dependency-breaking instructions)`, these, like zero-idioms, are dependency-breaking, although they produce ones and still consume resources. FIXME: as discussed in D48877, llvm-mca handling is broken for these. Reviewers: andreadb Reviewed By: andreadb Subscribers: gbedwell, RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D48876 llvm-svn: 336292	2018-07-04 17:32:44 +00:00
Fangrui Song	d9ba18363e	Replace unused output filenames with /dev/null in tests Similar to rLLD336129 llvm-svn: 336131	2018-07-02 18:16:44 +00:00
Simon Pilgrim	dcbc6b0387	[llvm-mca][x86] Add FMA4 resource tests We should be ensuring we have (near) complete test coverage of instructions, at least for the generic model. llvm-svn: 335870	2018-06-28 16:24:13 +00:00
Simon Pilgrim	5833866542	[llvm-mca][x86] Add 3dnow! resource tests We should be ensuring we have (near) complete test coverage of instructions, at least for the generic model. llvm-svn: 335869	2018-06-28 16:21:22 +00:00
Andrea Di Biagio	4893e095df	[llvm-mca][X86] Teach how to identify register writes that implicitly clear the upper portion of a super-register. This patch teaches llvm-mca how to identify register writes that implicitly zero the upper portion of a super-register. On X86-64, a general purpose register is implemented in hardware as a 64-bit register. Quoting the Intel 64 Software Developer's Manual: "an update to the lower 32 bits of a 64 bit integer register is architecturally defined to zero extend the upper 32 bits". Also, a write to an XMM register performed by an AVX instruction implicitly zeroes the upper 128 bits of the aliasing YMM register. This patch adds a new method named clearsSuperRegisters to the MCInstrAnalysis interface to help identify instructions that implicitly clear the upper portion of a super-register. The rest of the patch teaches llvm-mca how to use that new method to obtain the information, and update the register dependencies accordingly. I compared the kernels from tests clear-super-register-1.s and clear-super-register-2.s against the output from perf on btver2. Previously there was a large discrepancy between the estimated IPC and the measured IPC. Now the differences are mostly in the noise. Differential Revision: https://reviews.llvm.org/D48225 llvm-svn: 335113	2018-06-20 10:08:11 +00:00
Roman Lebedev	489484b98d	[X86][Znver1] Specify Register Files, RCU; FP scheduler capacity. Summary: First off: i do not have any access to that processor, so this is purely theoretical, no benchmarks. I have been looking into bdver2 scheduling profile, and while cross-referencing the existing btver2, znver1 profiles, and the reference docs (`Software Optimization Guide for AMD Family {15,16,17}h Processors`), i have noticed that only btver2 scheduling profile specifies these. Also, there is no mca test coverage. Reviewers: RKSimon, craig.topper, courbet, GGanesh, andreadb Reviewed By: GGanesh Subscribers: gbedwell, vprasad, ddibyend, shivaram, Ashutosh, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D47676 llvm-svn: 335099	2018-06-20 07:01:14 +00:00
Clement Courbet	59f6b366fb	[X86] Fix r335097 Missed `Generic` test in llvm-mca. llvm-svn: 335098	2018-06-20 06:44:13 +00:00

1 2 3 4 5

221 Commits