llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-24 03:33:20 +01:00

Author	SHA1	Message	Date
Carl Ritson	037097a4f3	[AMDGPU] Add SI_EARLY_TERMINATE_SCC0 for early terminating shader Add pseudo instruction to allow early termination of pixel shader anywhere based on the value of SCC. The intention is to use this when a mask of live lanes is updated, e.g. live lanes in WQM pass. This facilitates early termination of shaders even when EXEC is incomplete, e.g. in non-uniform control flow. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D88777	2021-01-13 13:29:05 +09:00
Jonas Devlieghere	abe71cac9b	[dsymutil] Fix spurious space in REQUIRES: line This test is incorrectly running on non-darwin hosts.	2021-01-12 20:13:44 -08:00
Jonas Devlieghere	06c9cee537	[dsymutil] s/dwarfdump/llvm-dwarfdump/ in test	2021-01-12 19:59:13 -08:00
Jonas Devlieghere	db48aa1fa2	[dsymutil] Copy eh_frame content into the dSYM companion file. Copy over the __eh_frame from the binary into the dSYM. This helps kernel developers that are working with only dSYMs (i.e. no binaries) when debugging a core file. This only kicks in when the __eh_frame exists in the linked binary. Most of the time ld64 will remove the section in favor of compact unwind info. When it is emitted, it's generally small enough and should not bloat the dSYM. rdar://69774935 Differential revision: https://reviews.llvm.org/D94460	2021-01-12 19:50:34 -08:00
Serguei Katkov	90ddcdef81	[InlineSpiller] Re-tie operands if folding failed InlineSpiller::foldMemoryOperand unties registers before an attempt to fold and does not restore tied-ness in case of failure. I do not have a particular test for demo of invalid behavior. This is something of clean-up. It is better to keep the behavior correct in case some time in future it happens. Reviewers: reames, dantrushin Reviewed By: dantrushin, reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D94389	2021-01-13 10:31:43 +07:00
Lang Hames	6dd6e1ee1a	[Orc] Add a unit test for asynchronous definition generation.	2021-01-13 14:23:36 +11:00
Jonas Devlieghere	d79f808e52	[dsymutil] Warn on timestmap mismatch between object file and debug map Add a warning when the timestmap doesn't match between the object file and the debug map entry. We were already emitting such warnings for archive members and swift interface files. This patch also unifies the warning across all three. rdar://65614640 Differential revision: https://reviews.llvm.org/D94536	2021-01-12 18:58:10 -08:00
Hsiangkai Wang	c2ec94c281	[NFC] Use generic name for scalable vector stack ID. Differential Revision: https://reviews.llvm.org/D94471	2021-01-13 10:57:43 +08:00
Nico Weber	6de70967d2	[gn build] Reorganize libcxx/include/BUILD.gn a bit - Merge 6706342f48bea80 -- no more libcxx_needs_site_config, we now always need it - Since it was always off in practice, write_config bitrot. Unbitrot it so that it works - Remove copy step and let concat step write to final location immediately -- and fix copy destination directory As a side effect, libcxx/include/BUILD.gn now has only a single sources list, which means the cmake sync script should be able to automatically sync additions and removals of .h files. On the flipside, this means this file now must be updated after most changes to libcxx/include/__config_site.in, and looking through the last few months of changes this looks like it's going to be a wash.	2021-01-12 21:30:06 -05:00
Reid Kleckner	5f88a1c6fc	[PDB] Defer relocating .debug$S until commit time and parallelize it This is a pretty classic optimization. Instead of processing symbol records and copying them to temporary storage, do a first pass to measure how large the module symbol stream will be, and then copy the data into place in the PDB file. This requires defering relocation until much later, which accounts for most of the complexity in this patch. This patch avoids copying the contents of all live .debug$S sections into heap memory, which is worth about 20% of private memory usage when making PDBs. However, this is not an unmitigated performance win, because it can be faster to read dense, temporary, heap data than it is to iterate symbol records in object file backed memory a second time. Results on release chrome.dll: peak mem: 5164.89MB -> 4072.19MB (-1,092.7MB, -21.2%) wall-j1: 0m30.844s -> 0m32.094s (slightly slower) wall-j3: 0m20.968s -> 0m20.312s (slightly faster) wall-j8: 0m19.062s -> 0m17.672s (meaningfully faster) I gathered similar numbers for a debug, component build of content.dll in Chrome, and the performance impact of this change was in the noise. The memory usage reduction was visible and similar. Because of the new parallelism in the PDB commit phase, more cores makes the new approach faster. I'm assuming that most C++ developer machines these days are at least quad core, so I think this is a win. Differential Revision: https://reviews.llvm.org/D94267	2021-01-12 17:46:29 -08:00
Yuanfang Chen	5d4fb15292	[Coroutine] Update promise object's final layout index promise is a header field but it is not guaranteed that it would be the third field of the frame due to `performOptimizedStructLayout`. Reviewed By: lxfind Differential Revision: https://reviews.llvm.org/D94137	2021-01-12 17:44:02 -08:00
Luo, Yuanke	eca875f881	[X86][AMX] Prohibit pointer cast on load. The load/store instruction will be transformed to amx intrinsics in the pass of AMX type lowering. Prohibiting the pointer cast make that pass happy. Differential Revision: https://reviews.llvm.org/D94372	2021-01-13 09:39:19 +08:00
Nico Weber	e05a72deb7	[gn build] (manually) port 79f99ba65d96	2021-01-12 20:30:56 -05:00
Juneyoung Lee	3333999419	[DAGCombiner] Fold BRCOND(FREEZE(COND)) to BRCOND(COND) This patch resolves the suboptimal codegen described in http://llvm.org/pr47873 . When CodeGenPrepare lowers select into a conditional branch, a freeze instruction is inserted. It is then translated to `BRCOND(FREEZE(SETCC))` in SelDag. The `FREEZE` in the middle of `SETCC` and `BRCOND` was causing a suboptimal code generation however. This patch adds `BRCOND(FREEZE(cond))` -> `BRCOND(cond)` fold to DAGCombiner to remove the `FREEZE`. To make this optimization sound, `BRCOND(UNDEF)` simply should nondeterministically jump to the branch or not, rather than raising UB. It wasn't clear what happens when the condition was undef according to the comments in ISDOpcodes.h, however. I updated the comments of `BRCOND` to make it explicit (as well as `BR_CC`, which is also a conditional branch instruction). Note that it diverges from the semantics of `br` instruction in IR, which is explicitly UB. Since the UB semantics was necessary to explain optimizations that use branching conditions, and SelDag doesn't seem to have such optimization, I think this divergence is okay. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D92015	2021-01-13 09:36:52 +09:00
Juneyoung Lee	38ff89f219	[LangRef] State that a nocapture pointer cannot be returned This is a small patch stating that a nocapture pointer cannot be returned. Discussed in D93189. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D94386	2021-01-13 09:30:54 +09:00
Joe Nash	521d6a1785	[AMDGPU] Add _e64 suffix to VOP3 Insts Previously, instructions which could be expressed as VOP3 in addition to another encoding had a _e64 suffix on the tablegen record name, while those only available as VOP3 did not. With this patch, all VOP3s will have the _e64 suffix. The assembly does not change, only the mir. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D94341 Change-Id: Ia8ec8890d47f8f94bbbdac43745b4e9dd2b03423	2021-01-12 18:33:18 -05:00
Mircea Trofin	e0d59fcd1e	[NFC] Disallow unused prefixes under MC/AMDGPU This patches remaining tests, and patches lit.local.cfg to block future such cases (until we flip FileCheck's flag) Differential Revision: https://reviews.llvm.org/D94556	2021-01-12 15:24:44 -08:00
Jessica Paquette	945db19648	[MIPatternMatch] Add matcher for G_PTR_ADD Add a matcher which recognizes G_PTR_ADD and add a test. Differential Revision: https://reviews.llvm.org/D94348	2021-01-12 15:21:19 -08:00
Hongtao Yu	ab8997cebd	Add sample-profile-suffix-elision-policy attribute with -funique-internal-linkage-names. Adding sample-profile-suffix-elision-policy attribute to functions whose linkage names are uniquefied so that their unique name suffix won't be trimmed when applying AutoFDO profiles. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D94455	2021-01-12 15:15:53 -08:00
Craig Topper	e423e154e0	[RISCV] Remove '.mask' from vcompress intrinsic name. NFC It has a mask argument, but isn't a masked instruction. It doesn't use the mask policy of or the v0.t syntax.	2021-01-12 14:46:16 -08:00
Nathan James	0bd7718aea	[ADT][NFC] Use empty base optimisation in BumpPtrAllocatorImpl Most uses of this class just use the default MallocAllocator. As this contains no fields, we can use the empty base optimisation for BumpPtrAllocatorImpl and save 8 bytes of padding for most use cases. This prevents using a class that is marked as `final` as the `AllocatorT` template argument. In one must use an allocator that has been marked as `final`, the simplest way around this is a proxy class. The class should have all the methods that `AllocaterBase` expects and should forward the calls to your own allocator instance. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D94439	2021-01-12 22:43:48 +00:00
Mircea Trofin	a8191b4ff3	[NFC] Disallow unused prefixes in MC/AMDGPU 1 out of 2 patches. Differential Revision: https://reviews.llvm.org/D94553	2021-01-12 14:31:22 -08:00
Matt Arsenault	156505f011	AMDGPU: Remove wrapper only call limitation This seems to only have overridden cold handling, which we probably shouldn't do. As far as I can tell the wrapper library functions are still inlined as appropriate.	2021-01-12 17:12:49 -05:00
Martin Storsjö	a92bacb796	[AArch64] [Windows] Properly add :lo12: reloc specifiers when generating assembly This makes sure that assembly output actually can be assembled. Set the correct MCExpr relocations specifier VK_PAGEOFF - and also set VK_PAGE consistently even though it's not visible in the assembly output. Differential Revision: https://reviews.llvm.org/D94365	2021-01-12 23:56:03 +02:00
modimo	d56ed46145	[Inliner] Change inline remark format and update ReplayInlineAdvisor to use it This change modifies the source location formatting from: LineNumber.Discriminator to: LineNumber:ColumnNumber.Discriminator The motivation here is to enhance location information for inline replay that currently exists for the SampleProfile inliner. This will be leveraged further in inline replay for the CGSCC inliner in the related diff. The ReplayInlineAdvisor is also modified to read the new format and now takes into account the callee for greater accuracy. Testing: ninja check-llvm Reviewed By: mtrofin Differential Revision: https://reviews.llvm.org/D94333	2021-01-12 13:43:48 -08:00
Nikita Popov	491225a262	[InstCombine] Handle logical and/or in assume optimization assume(a && b) can be converted to assume(a); assume(b) even if the condition is logical. Same for assume(!(a \|\| b)).	2021-01-12 22:36:40 +01:00
Michael Munday	f8a20579df	[RISCV] Legalize select when Zbt extension available The custom expansion of select operations in the RISC-V backend interferes with the matching of cmov instructions. Legalizing select when the Zbt extension is available solves that problem. Reviewed By: lenary, craig.topper Differential Revision: https://reviews.llvm.org/D93767	2021-01-12 21:24:38 +00:00
Nikita Popov	ceadd77481	[InstCombine] Add tests for logical and/or poison implication (NFC) These tests cover some cases where we can fold select to and/or based on poison implication logic.	2021-01-12 22:18:51 +01:00
Craig Topper	1cc849ccbc	[RISCV] Add double test cases to vfmerge-rv32.ll. NFC	2021-01-12 13:09:48 -08:00
Sanjay Patel	b41334d30a	[SLP] reduce code duplication while processing reductions; NFC	2021-01-12 16:03:57 -05:00
Sanjay Patel	4f0882e467	[SLP] rename variable to improve readability; NFC The OperationData in the 2nd block (visiting the operands) is completely independent of the 1st block.	2021-01-12 16:03:57 -05:00
Sanjay Patel	edb88f4fad	[SLP] reduce code duplication in processing reductions; NFC	2021-01-12 16:03:57 -05:00
Sanjay Patel	a486f56de3	[SLP] reduce code duplication while matching reductions; NFC	2021-01-12 16:03:57 -05:00
Philip Reames	42d8098f91	[LV] Weaken spuriously strong assert in LoopVersioning LoopVectorize uses some utilities on LoopVersioning, but doesn't actually use it for, you know, versioning. As a result, the precondition LoopVersioning expects is too strong for this user. At the moment, LoopVectorize supports any loop with a unique exit block, so check the same precondition here. Really, the whole class structure here is a mess. We should separate the actual versioning from the metadata updates, but that's a bigger problem.	2021-01-12 12:57:13 -08:00
Nikita Popov	302c879f6c	[InstCombine] Duplicate tests for logical and/or (NFC) This replicates existing and/or tests to also test variants using select. This should help us get a more accurate view on which optimizations we're missing if we disable the select -> and/or fold.	2021-01-12 21:50:41 +01:00
Philip Reames	e62e2effb0	[LV] Relax assumption that LCSSA implies single entry This relates to the ongoing effort to support vectorization of multiple exit loops (see D93317). The previous code assumed that LCSSA phis were always single entry before the vectorizer ran. This was correct, but only because the vectorizer allowed only a single exiting edge. There's nothing in the definition of LCSSA which requires single entry phis. A common case where this comes up is with a loop with multiple exiting blocks which all reach a common exit block. (e.g. see the test updates) Differential Revision: https://reviews.llvm.org/D93725	2021-01-12 12:34:52 -08:00
Nikita Popov	b14fd31fef	[InstCombine] Regenerate test checks (NFC)	2021-01-12 21:26:42 +01:00
Florian Hahn	89fc038707	[FunctionAttrs] Derive willreturn for fns with readonly `&` mustprogress`. Similar to D94125, derive `willreturn` for functions that are `readonly` and `mustprogress` in FunctionAttrs. To quote the reasoning from D94125: Since D86233 we have `mustprogress` which, in combination with `readonly`, implies `willreturn`. The idea is that every side-effect has to be modeled as a "write". Consequently, `readonly` means there is no side-effect, and `mustprogress` guarantees that we cannot "loop" forever without side-effect. Reviewed By: jdoerfert, nikic Differential Revision: https://reviews.llvm.org/D94502	2021-01-12 20:02:34 +00:00
David Truby	7a27e1780d	[clang][aarch64] Precondition isHomogeneousAggregate on isCXX14Aggregate MSVC on WoA64 includes isCXX14Aggregate in its definition. This is de-facto specification on that platform, so match msvc's behaviour. Fixes: https://bugs.llvm.org/show_bug.cgi?id=47611 Co-authored-by: Peter Waller <peter.waller@arm.com> Differential Revision: https://reviews.llvm.org/D92751	2021-01-12 19:44:01 +00:00
Nikita Popov	2f31020aa4	[InstSimplify] Don't fold gep p, -p to null This is a partial fix for https://bugs.llvm.org/show_bug.cgi?id=44403. Folding gep p, q-p to q is only legal if p and q have the same provenance. This fold should probably be guarded by something like getUnderlyingObject(p) == getUnderlyingObject(q). This patch is a partial fix that removes the special handling for gep p, 0-p, which will fold to a null pointer, which would certainly not pass an underlying object check (unless p is also null, in which case this would fold trivially anyway). Folding to a null pointer is particularly problematic due to the special handling it receives in many places, making end-to-end miscompiles more likely. Differential Revision: https://reviews.llvm.org/D93820	2021-01-12 20:24:23 +01:00
Florian Hahn	60e7732238	[FunctionAttrs] Precommit tests for willreturn inference. Tests for D94502.	2021-01-12 19:16:50 +00:00
Craig Topper	fc77e995ec	[RISCV] Use vmerge.vim for llvm.riscv.vfmerge with a 0.0 scalar operand. We can use a 0 immediate to avoid needing to materialize 0 into an FPR first. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D94459	2021-01-12 11:08:26 -08:00
Arthur Eubanks	d28c8be642	[NewPM] Run non-trivial loop unswitching under -O2/3/s/z Fixes https://bugs.llvm.org/show_bug.cgi?id=48715. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D94448	2021-01-12 11:04:40 -08:00
Craig Topper	6008e3ca8a	[LegalizeDAG][RISCV][PowerPC][AMDGPU][WebAssembly] Improve expansion of SETONE/SETUEQ on targets without SETO/SETUO. If SETO/SETUO aren't legal, they'll be expanded and we'll end up with 3 comparisons. SETONE is equivalent to (SETOGT \|\| SETOLT) so if one of those operations is supported use that expansion. We don't need both since we can commute the operands to make the other. SETUEQ can be implemented with !(SETOGT \|\| SETOLT) or (SETULE && SETUGE). I've only implemented the first because it didn't look like most of the affected targets had legal SETULE/SETUGE. Reviewed By: frasercrmck, tlively, nemanjai Differential Revision: https://reviews.llvm.org/D94450	2021-01-12 10:45:03 -08:00
Dávid Bolvanský	58595a7835	[instCombine] Add (A ^ B) \| ~(A \| B) -> ~(A & B) define i32 @src(i32 %x, i32 %y) { %0: %xor = xor i32 %y, %x %or = or i32 %y, %x %neg = xor i32 %or, 4294967295 %or1 = or i32 %xor, %neg ret i32 %or1 } => define i32 @tgt(i32 %x, i32 %y) { %0: %and = and i32 %x, %y %neg = xor i32 %and, 4294967295 ret i32 %neg } Transformation seems to be correct! https://alive2.llvm.org/ce/z/Cvca4a	2021-01-12 19:29:17 +01:00
Dávid Bolvanský	f16a8fbbac	[Tests] Add tests for new InstCombine OR transformation, NFC	2021-01-12 19:29:17 +01:00
Michał Górny	69e423efd5	[llvm] [cmake] Remove obsolete /usr/local hack for *BSD Remove the hack adding /usr/local paths on FreeBSD and DragonFlyBSD. It does not seem to be necessary today, and it breaks cross builds. Differential Revision: https://reviews.llvm.org/D94491	2021-01-12 19:26:04 +01:00
Cullen Rhodes	b30ad48824	[SVE][NFC] Regenerate a few CodeGen tests Regenerated using llvm/utils/update_llc_test_checks.py as part of D94504, committing separately to reduce the diff for D94504.	2021-01-12 18:10:36 +00:00
Simon Pilgrim	ce76c6de45	[AMDGPU] Regenerate umax crash test	2021-01-12 18:02:15 +00:00
Simon Pilgrim	873ac18cc1	[X86] Regenerate sdiv_fix_sat.ll + udiv_fix_sat.ll tests Adding missing libcall PLT qualifiers	2021-01-12 17:25:30 +00:00

1 2 3 4 5 ...

209596 Commits