llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-22 18:54:02 +01:00

Author	SHA1	Message	Date
Xiang1 Zhang	38ab3a2d9f	[X86] Support AMX fast register allocation Differential Revision: https://reviews.llvm.org/D100026	2021-05-08 14:21:11 +08:00
Arthur Eubanks	d59e5b412e	Fix build after 34a8a437b	2021-05-07 23:18:44 -07:00
Xiang1 Zhang	f668ad4ced	Revert "[X86] Support AMX fast register allocation" This reverts commit 77e2e5e07d01fe0b83c39d0c527c0d3d2e659146.	2021-05-08 13:43:32 +08:00
Xiang1 Zhang	fe856bad78	[X86] Support AMX fast register allocation	2021-05-08 13:27:21 +08:00
Michael Liao	6d153ca7f4	Replace a remaining CRLF with LF. NFC.	2021-05-08 01:09:15 -04:00
Arthur Eubanks	b987f39d75	[NewPM] Hide pass manager debug logging behind -debug-pass-manager-verbose Printing pass manager invocations is fairly verbose and not super useful. This allows us to remove DebugLogging from pass managers and PassBuilder since all logging (aside from analysis managers) goes through instrumentation now. This has the downside of never being able to print the top level pass manager via instrumentation, but that seems like a minor downside. Reviewed By: ychen Differential Revision: https://reviews.llvm.org/D101797	2021-05-07 21:51:47 -07:00
RamNalamothu	61d8a36289	[DebugInfo] UnwindTable::create() should not add empty rows to CFI unwind table UnwindTable::parseRows() may return successfully if the CFIProgram has either no CFI instructions or only DW_CFA_nop instructions and the UnwindRow return argument will be empty. But currently, the callers are not checking for this case which is leading to incorrect dumps in the unwind tables in such cases i.e. CFA=unspecified Reviewed By: clayborg Differential Revision: https://reviews.llvm.org/D101892	2021-05-08 10:19:02 +05:30
Arthur Eubanks	6acb684b54	[lit] Bump up the Windows process cap from 32 to 60 At 61 or over, I see messages like File "...\Python\Python39\lib\multiprocessing\connection.py", line 816, in _exhaustive_wait res = _winapi.WaitForMultipleObjects(L, False, timeout) ValueError: need at most 63 handles, got a sequence of length 64 60 seems to work for me. If this causes issues for anybody else, feel free to revert.	2021-05-07 18:13:38 -07:00
Arthur Eubanks	9df931e590	Revert "lit: revert 134b103fc0f3a995d76398bf4b029d72bebe8162" This reverts commit d319005a3746a7661c8c9a3302266b6ff7cf61be. Causing messages like: File "...\Python\Python39\lib\multiprocessing\connection.py", line 816, in _exhaustive_wait res = _winapi.WaitForMultipleObjects(L, False, timeout) ValueError: need at most 63 handles, got a sequence of length 74	2021-05-07 18:00:11 -07:00
Arthur Eubanks	1a92538daa	[gn build] Manually port 5b158093e	2021-05-07 17:54:32 -07:00
Amara Emerson	9146866d14	[AArch64][GlobalISel] Create a new minimal combiner pass just for -O0. We never bothered to have a separate set of combines for -O0 in the prelegalizer before. This results in some minor performance hits for a mode where performance isn't a concern (although not regressing code size significantly is still preferable). This also removes the CSE option since we don't need it for -O0. Through experiments, I've arrived at a set of combines that gets the most code size improvement at -O0, while reducing the amount of time spent in the combiner by around 35% give or take. Differential Revision: https://reviews.llvm.org/D102038	2021-05-07 17:01:27 -07:00
Amara Emerson	818c390c9c	[GlobalISel] Don't form zero/sign extending loads for atomics. For importing patterns, we only support matching G_LOAD, not G_ZEXTLOAD or G_SEXTLOAD. Differential Revision: https://reviews.llvm.org/D101932	2021-05-07 16:41:48 -07:00
Arthur Eubanks	86faf963ab	[NewPM] Move analysis invalidation/clearing logging to instrumentation We're trying to move DebugLogging into instrumentation, rather than being part of PassManagers/AnalysisManagers. Reviewed By: ychen Differential Revision: https://reviews.llvm.org/D102093	2021-05-07 15:25:31 -07:00
Jessica Paquette	f2be584bf8	[AArch64][GlobalISel] Legalize narrow type G_CTPOPs Using `clampScalar` here because we ought to mark s128 as custom eventually. (Right now, it will just fall back.) With this legalization, we get the same code as SDAG: https://godbolt.org/z/TneoPKrKG Differential Revision: https://reviews.llvm.org/D100908	2021-05-07 14:52:23 -07:00
Adrian Prantl	aade3db1e3	Fix the module-enabled build by removing a redundant type definition.	2021-05-07 14:45:17 -07:00
Florian Hahn	212afd7758	[LV] Remove reference of PHI from comment, they are not recorded (NFC). The comment incorrectly states that the PHI is recorded. That's not accurate, only the recipe for the incoming value is recorded. Suggested post-commit for 4ba8720f8844.	2021-05-07 21:34:23 +01:00
Andrea Di Biagio	ba5ec98548	[MCA][RegisterFile] Fix register class check for move elimination (PR50265) The register file should always check if the destination register is from a register class that allows move elimination. Before this change, the check on the register class was only performed in a few very specific cases. However, it should have always been performed. This patch fixes the issue. Note that none of the upstream scheduling models is currently affected by this bug, so there is no test for it. The issue was found by Roman while working on the znver3 model. I was able to reproduce the issue locally by tweaking the btver2 model. I then verified that this patch fixes the issue.	2021-05-07 21:30:25 +01:00
Florian Hahn	2803fff409	[LV] Assert if trying to sink replicate region into another region (NFC) Currently sinking a replicate region into another replicate region is not supported. Add an assert, to make the problem more obvious, should it occur. Discussed post-commit for ccebf7a1096a.	2021-05-07 21:25:35 +01:00
Florian Hahn	d1b5132397	[LV] Rename Region to TargetRegion, similar to SinkRegion (NFC). Adjust the name to make it clearer this is the region containing the target recipe, similar to SinkRegion below. Suggested post-commit for ccebf7a1096a.	2021-05-07 21:25:35 +01:00
Arthur Eubanks	4eb0ba33d1	Revert "[DebugInfo] Fix updateDbgUsersToReg to support DBG_VALUE_LIST" This reverts commit 0791f968fee259e5c34523167bd58179b8b081c2. Causing crashes: https://crbug.com/1206764	2021-05-07 12:05:16 -07:00
Florian Hahn	df7e45dd98	[SCEV] By more careful when traversing phis in isImpliedViaMerge. I think currently isImpliedViaMerge can incorrectly return true for phis in a loop/cycle, if the found condition involves the previous value of Consider the case in exit_cond_depends_on_inner_loop. At some point, we call (modulo simplifications) isImpliedViaMerge(<=, %x.lcssa, -1, %call, -1). The existing code tries to prove IncV <= -1 for all incoming values InvV using the found condition (%call <= -1). At the moment this succeeds, but only because it does not compare the same runtime value. The found condition checks the value of the last iteration, but the incoming value is from the previous iteration. Hence we incorrectly determine that the previous value was <= -1, which may not be true. I think we need to be more careful when looking at the incoming values here. In particular, we need to rule out that a found condition refers to any value that may refer to one of the previous iterations. I'm not sure there's a reliable way to do so (that also works of irreducible control flow). So for now this patch adds an additional requirement that the incoming value must properly dominate the phi block. This should ensure the values do not change in a cycle. I am not entirely sure if will catch all cases and I appreciate a through second look in that regard. Alternatively we could also unconditionally bail out in this case, instead of checking the incoming values Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D101829	2021-05-07 19:52:29 +01:00
Fangrui Song	e03ea6bfdf	[unittest] Fix -Wunused-variable after D94717	2021-05-07 11:42:16 -07:00
Krzysztof Parzyszek	fd58086fcb	Allow empty value list in propagateMetadata(Inst, ArrayOf...) This will allow writing propagateMetadata(Inst, collectInterestingValues(...)) without concern about empty lists. In case of an empty list, Inst is returned without any changes.	2021-05-07 13:20:50 -05:00
Fangrui Song	9289990558	Internalize some cl::opt global variables or move them under namespace llvm	2021-05-07 11:15:43 -07:00
Saleem Abdulrasool	363580b4e0	lit: revert 134b103fc0f3a995d76398bf4b029d72bebe8162 Revert the 32-process cap on Windows. When testing with Swift, we found that there was a time reduction for testing with the higher load. This should hopefully not matter much in practice. In the case that the original problem with python remains with a high subprocess count, we can easily revert this change.	2021-05-07 10:22:43 -07:00
Roman Lebedev	6449ef04c8	[X86] AMD Zen 3: mark XMM/YMM (but not MMX!) reg moves as eliminatible in RegisterFile	2021-05-07 20:11:21 +03:00
Roman Lebedev	5f3fe26c82	[X86] AMD Zen 3: MOVSX32rr32 is a zero-cycle move It measures as such, and the reference docs agree. I can't easily add a MCA test, because there's no mnemonic for it, it can only be disassembled or created as a MCInst.	2021-05-07 20:11:20 +03:00
Fangrui Song	0180f4a1d1	[AArch64][ELF] Prefer to lower MC_GlobalAddress operands to .Lfoo$local Similar to X86 D73230 & 46788a21f9152be3950e57dc526454655682bdd4 With this change, we can set dso_local in clang's -fpic -fno-semantic-interposition mode, for default visibility external linkage non-ifunc-non-COMDAT definitions. For such dso_local definitions, variable access/taking the address of a function/calling a function will go through a local alias to avoid GOT/PLT. Note: the 'S' inline assembly constraint refers to an absolute symbolic address or a label reference (D46745). Differential Revision: https://reviews.llvm.org/D101872	2021-05-07 09:44:26 -07:00
Whitney Tsang	e5ca2592d4	[LoopNest] Consider loop nest with inner loop guard using outer loop induction variable to be perfect This patch allow more conditional branches to be considered as loop guard, and so more loop nests can be considered perfect. Reviewed By: bmahjour, sidbav Differential Revision: https://reviews.llvm.org/D94717	2021-05-07 16:04:18 +00:00
Simon Pilgrim	3ceb0303a8	[X86] combineXor - limit fold to non-opaque constants (PR50254) Ensure we don't try to fold when one might be an opaque constant - the constant fold will fail and then the reverse fold will happen in DAGCombine.....	2021-05-07 16:39:24 +01:00
Roman Lebedev	37c9dcd0cf	[X86] AMD Zen 3: _REV variants of zero-cycles moves are also zero-cycles (PR50261) Sometimes disassembler picks _REV variants of instructions over the plain ones, which in this case exposed an issue that the _REV variants aren't being modelled as optimizable moves.	2021-05-07 18:27:40 +03:00
Roman Lebedev	2eec1309e5	[NFC][X86][MCA] AMD Zen3: add test for zero-cycle X87 move	2021-05-07 18:27:40 +03:00
Joseph Tremoulet	b501eafd98	BasicAA: Recognize inttoptr as isEscapeSource Pointers escape when converted to integers, so a pointer produced by converting an integer to a pointer must not be a local non-escaping object. Reviewed By: nikic, nlopes, aqjune Differential Revision: https://reviews.llvm.org/D101541	2021-05-07 07:48:50 -07:00
Sanjay Patel	6ec9b04dd2	[AArch64] add test for missed vectorization; NFC This is a reduction of the example in: https://llvm.org/PR50256	2021-05-07 10:45:11 -04:00
Roman Lebedev	01a7d33cb8	[NFC][X86][MCA] AMD Zen3 Decrease iteration count in reg-move-elimination tests Drop it just enough so it still produces the right IPC.	2021-05-07 17:06:45 +03:00
Roman Lebedev	05dc778f5c	[X86] AMD Zen 3: throughput for renameable XMM/YMM moves is 6 They are resolved at the register rename stage without using any execution units.	2021-05-07 17:06:45 +03:00
Roman Lebedev	f8e6315b4a	[X86] AMD Zen 3: AVX YMM moves are zero-cycle I've verified this with llvm-exegesis. This is not limited to zero registers.	2021-05-07 17:06:45 +03:00
Roman Lebedev	f741d942f9	[X86] AMD Zen 3: AVX XMM moves are zero-cycle I've verified this with llvm-exegesis. This is not limited to zero registers.	2021-05-07 17:06:44 +03:00
Roman Lebedev	8c8821fc73	[X86] AMD Zen 3: SSE XMM moves are zero-cycle I've verified this with llvm-exegesis. This is not limited to zero registers. Refs: AMD SOG 19h, 2.9.4 Zero Cycle Move The processor is able to execute certain register to register mov operations with zero cycle delay. Agner, 22.13 Instructions with no latency Register-to-register move instructions are resolved at the register rename stage without using any execution units. These instructions have zero latency. It is possible to do six such register renamings per clock cycle, and it is even possible to rename the same register multiple times in one clock cycle.	2021-05-07 17:06:44 +03:00
Roman Lebedev	e720a8cc78	[NFC][X86][MCA] AMD Zen 3: Add tests for renameable AVX YMM moves	2021-05-07 17:06:44 +03:00
Roman Lebedev	08ef520ecb	[NFC][X86][MCA] AMD Zen 3: Add tests for renameable AVX XMM moves	2021-05-07 17:06:44 +03:00
Roman Lebedev	944bd39b12	[NFC][X86][MCA] AMD Zen 3: Add tests for renameable SSE XMM moves	2021-05-07 17:06:44 +03:00
Roman Lebedev	adf9a78691	[X86] AMD Zen 3: throughput for renameable GPR moves is 6 They are resolved at the register rename stage without using any execution units.	2021-05-07 17:06:43 +03:00
Roman Lebedev	87a05050e0	[NFC][X86] AMD Zen 3: move sched classes for renameables moves togeter	2021-05-07 17:06:43 +03:00
Roman Lebedev	28604fa5a6	[NFC][X86][MCA] Increase iteration count in reg move elimination tests So the IPC actually stabilizes at 6.	2021-05-07 17:06:43 +03:00
Stephen Tozer	77766b7092	Reapply "[DebugInfo] Drop DBG_VALUE_LISTs with an excessive number of debug operands" Reapply b623df3c, which was reverted while reverting a different patch with a breaking change. There are no underlying issues with this patch, so no changes have been made to the original patch. This reverts commit b11e4c990771541e440861f017afea7b4ba162f4.	2021-05-07 14:55:02 +01:00
Simon Pilgrim	9edfcde2eb	[CodeGen] Ensure UserValue::getDebugLoc() and UserLabel::getDebugLoc() consistently return a const reference NFCI. Avoids a lot of unnecessary tracking increments/decrements of the underlying TrackingMDNodeRef.	2021-05-07 14:48:23 +01:00
Simon Pilgrim	b5e4cc0124	[DAG] Ensure all SD classes consistently return a const reference with getDebugLoc(). NFCI. Avoids a lot of unnecessary tracking increments/decrements of the underlying TrackingMDNodeRef.	2021-05-07 14:48:23 +01:00
Benjamin Kramer	f2e0efd85f	Retire TargetRegisterInfo::getSpillAlignment getSpillAlign does the same thing.	2021-05-07 15:16:22 +02:00
Sebastian Neubauer	154e1ab9f4	[AMDGPU] Restrict immediate scratch offsets gfx9 does not work with negative offsets, gfx10 works only with aligned negative offsets, but not with unaligned negative offsets. This is slightly more conservative than needed, gfx9 does support negative offsets when a VGPR address is used and gfx10 supports negative, unaligned offsets when an SGPR address is used, but we do not make use of that with this patch. Differential Revision: https://reviews.llvm.org/D101292	2021-05-07 14:51:32 +02:00

1 2 3 4 5 ...

215401 Commits