llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-22 18:54:02 +01:00

Author	SHA1	Message	Date
Markus Böck	20787e83b0	[PR48898][CMake] Support MinGW Toolchain tool sin llvm_ExternalProject_Add Windows is in the unique position of having two drivers, clang-cl and normal GNU clang, depending on whether a GNU or MSVC target is used. The current implementation with the USE_TOOLCHAIN argument assumes that when CMAKE_SYSTEM_NAME is set to Windows that clang-cl should be used, which is the incorrect choice when targeting a GNU environment. This patch solves this problem by adding an optional TARGET_TRIPLE argument to llvm_ExternalProject_Add, which sets the various CMAKE_<LANG>_COMPILER_TARGET variables. Additionally, if the triple is detected as an MSVC environment, clang-cl and similar MSVC specific tools will be used instead of the GNU tools.	2021-03-02 22:45:05 +01:00
Heejin Ahn	af019c3b48	[WebAssembly] Fix more ExceptionInfo grouping bugs This fixes two bugs in `WebAssemblyExceptionInfo` grouping, created by D97247. These two bugs are not easy to split into two different CLs, because tests that fail for one also tend to fail for the other. - In D97247, when fixing `ExceptionInfo` grouping by taking out the unwind destination' exception from the unwind src's exception, we just iterated the BBs in the function order, but this was incorrect; this changes it to dominator tree preorder. Please refer to the comments in the code for the reason and an example. - After this subexception-taking-out fix, there still can be remaining BBs we have to take out. When Exception B is taken out of Exception A (because EHPad B is the unwind destination of EHPad A), there can still be BBs within Exception A that are reachable from Exception B, which also should be taken out. Please refer to the comments in the code for more detailed explanation on why this can happen. To make this possible, this splits `WebAssemblyException::addBlock` into two parts: adding to a set and adding to a vector. We need to iterate on BBs within a `WebAssemblyException` to fix this, so we add BBs to sets first. But we add BBs to vectors later after we fix all incorrectness because deleting BBs from vectors is expensive. I considered removing the vector from `WebAssemblyException`, but it was not easy because this class has to maintain a similar interface with `MachineLoop` to be wrapped into a single interface `SortRegion`, which is used in CFGSort. Other misc. drive-by fixes: - Make `WebAssemblyExceptionInfo` do not even run when wasm EH is not used or the function doesn't have any EH pads, not to waste time - Add `LLVM_DEBUG` lines for easy debugging - Fix `preds` comments in cfg-stackify-eh.ll - Fix `__cxa_throw`'s signature in cfg-stackify-eh.ll Fixes https://github.com/emscripten-core/emscripten/issues/13554. Reviewed By: dschuff, tlively Differential Revision: https://reviews.llvm.org/D97677	2021-03-02 13:44:09 -08:00
Nikita Popov	888a67345b	[LICM] Make promotion faster Even when MemorySSA-based LICM is used, an AST is still populated for scalar promotion. As the AST has quadratic complexity, a lot of time is spent in this step despite the existing access count limit. This patch optimizes the identification of promotable stores. The idea here is pretty simple: We're only interested in must-alias mod sets of loop invariant pointers. As such, only populate the AST with loop-invariant loads and stores (anything else is definitely not promotable) and then discard any sets which alias with any of the remaining, definitely non-promotable accesses. If we promoted something, check whether this has made some other accesses loop invariant and thus possible promotion candidates. This is much faster in practice, because we need to perform AA queries for O(NumPromotable^2 + NumPromotable*NumNonPromotable) instead of O(NumTotal^2), and NumPromotable tends to be small. Additionally, promotable accesses have loop invariant pointers, for which AA is cheaper. This has a signicant positive compile-time impact. We save ~1.8% geomean on CTMark at O3, with 6% on lencod in particular and 25% on individual files. Conceptually, this change is NFC, but may not be so in practice, because the AST is only an approximation, and can produce different results depending on the order in which accesses are added. However, there is at least no impact on the number of promotions (licm.NumPromoted) in test-suite O3 configuration with this change. Differential Revision: https://reviews.llvm.org/D89264	2021-03-02 22:10:48 +01:00
Yonghong Song	4e4cb187f5	BPF: Fix a bug in peephole TRUNC elimination optimization Andrei Matei reported a llvm11 core dump for his bpf program https://bugs.llvm.org/show_bug.cgi?id=48578 The core dump happens in LiveVariables analysis phase. #4 0x00007fce54356bb0 __restore_rt #5 0x00007fce4d51785e llvm::LiveVariables::HandleVirtRegUse(unsigned int, llvm::MachineBasicBlock, llvm::MachineInstr&) #6 0x00007fce4d519abe llvm::LiveVariables::runOnInstr(llvm::MachineInstr&, llvm::SmallVectorImpl<unsigned int>&) #7 0x00007fce4d519ec6 llvm::LiveVariables::runOnBlock(llvm::MachineBasicBlock, unsigned int) #8 0x00007fce4d51a4bf llvm::LiveVariables::runOnMachineFunction(llvm::MachineFunction&) The bug can be reproduced with llvm12 and latest trunk as well. Futher analysis shows that there is a bug in BPF peephole TRUNC elimination optimization, which tries to remove unnecessary TRUNC operations (a <<= 32; a >>= 32). Specifically, the compiler did wrong transformation for the following patterns: %1 = LDW ... %2 = SLL_ri %1, 32 %3 = SRL_ri %2, 32 ... %3 ... %4 = SRA_ri %2, 32 ... %4 ... The current transformation did not check how many uses of %2 and did transformation like %1 = LDW ... ... %1 ... %4 = SRL_ri %2, 32 ... %4 ... and pseudo register %2 is used by not defined and caused LiveVariables analysis core dump. To fix the issue, when traversing back from SRL_ri to SLL_ri, check to ensure SLL_ri has only one use. Otherwise, don't do transformation. Differential Revision: https://reviews.llvm.org/D97792	2021-03-02 13:03:42 -08:00
Amara Emerson	17dd5ced91	[AArch64][GlobalISel] Enable use of the optsize predicate in the selector. To do this while supporting the existing functionality in SelectionDAG of using PGO info, we add the ProfileSummaryInfo and LazyBlockFrequencyInfo analysis dependencies to the instruction selector pass. Then, use the predicate to generate constant pool loads for f32 materialization, if we're targeting optsize/minsize. Differential Revision: https://reviews.llvm.org/D97732	2021-03-02 12:55:51 -08:00
Stefan Gränitz	ae81b5754a	[llvm-jitlink] Prevent missing symbols from JITLoaderGDB with MSVC mangling The issue came up on builder clang-x64-windows-msvc after 5182a7901a5d83dfd15021d01e8a1899910130ec	2021-03-02 21:44:54 +01:00
Sanjay Patel	0f5d46fc15	[SDAG] allow partial undef vector constants with select->logic folds This is an enhancement suggested in the original review/commit: D97730 / 7fce3322a283	2021-03-02 14:29:15 -05:00
Sanjay Patel	deff241ffa	[AArch64] add select tests with partial vector undefs; NFC	2021-03-02 14:29:15 -05:00
David Green	e8b666c3d9	[ARM] Use 0, not ZR during ISel for CSINC/INV/NEG Instead of converting the 0 into a ZR reg during lowering, do that with tablegen by matching the zero immediate. This when combined with other optimizations is more likely to use ZR and helps keep the DAG more easily optimizable. It should not otherwise effect code generation.	2021-03-02 19:01:14 +00:00
Jonas Paulsson	1ba442ae80	[SystemZ] Assign the full space for promoted and split outgoing args. When a large "irregular" (e.g. i96) integer call argument is converted to indirect, 64-bit parts are stored to the stack. The full stack space (e.g. i128) was not allocated prior to this patch, but rather just the exact space of the original type. This caused neighboring values on the stack to be overwritten. Thanks to Josh Stone for reporting this. Review: Ulrich Weigand Fixes https://bugs.llvm.org/show_bug.cgi?id=49322 Differential Revision: https://reviews.llvm.org/D97514	2021-03-02 12:56:47 -06:00
Nico Weber	0cde944f58	[gn build] fix llvm-jitlink tests on linux after ef2389235c5dec0	2021-03-02 13:41:09 -05:00
Joe Nash	f210e9dbe7	[AMDGPU] Make OMod explicit for V_CVT_{U,I}* Make OMod explicit instead of implied by HasModifiers in the operand list. Requires explicitly setting HasOMod=1 for irregular OMod usage in instruction V_CVT_{U,I}* Reviewed By: foad Differential Revision: https://reviews.llvm.org/D97587 Change-Id: I230e1476f529e816eec60e242531f23a99e3839f	2021-03-02 13:32:06 -05:00
Vy Nguyen	7abc52c73c	[lld-macho] Change loadReexport to handle the case where a TAPI re-exports to reference documents nested within other TBD. Currently, it was delibrately impleneted to not handle this case, but as it has turnt out, we need this feature. The concrete use case is `System/Library/Frameworks/Cocoa.framework/Versions/A/Cocoa` reexports /System/Library/Frameworks/AppKit.framework/Versions/C/AppKit , which then rexports /System/Library/PrivateFrameworks/UIFoundation.framework/Versions/A/UIFoundation The current implemention uses a global currentTopLevelTapi, which is not reset until it finishes loading the whole tree. This is a problem because if the top-level is set to Cocoa, then when we get to UIFoundation, it will try to find UIFoundation in the current top level, which is Cocoa and will not find it. The right thing should be: - When loading a library from a TBD file, re-exports need to be looked up in the auxiliary documents within the same TBD. - When loading from an actual dylib, no additional TBD documents need to be examined. - In no case does a re-export mentioned in one TBD file need to be looked up in a document in an auxiliary document from a different TBD file Differential Revision: https://reviews.llvm.org/D97438	2021-03-02 12:14:31 -05:00
Krzysztof Parzyszek	1e0e652e78	[TableGen] Add IntrNoMerge as intrinsic property There is a function attribute 'nomerge' in addition to 'noduplicate' and 'convergent'. Both 'noduplicate' and 'convergent' have corresponding intrinsic properties. This patch adds an intrinsic property for the 'nomerge' attribute. Differential Revision: https://reviews.llvm.org/D96364	2021-03-02 09:04:50 -08:00
Fraser Cormack	97dc903cfa	[RISCV] Support fixed-length INSERT_VECTOR_ELT This patch enables support for lowering INSERT_VECTOR_ELT on fixed-length vector types. The strategy follows that for scalable vector types. This patch also includes a quick fix to prevent the compiler infinitely looping between lowering BUILD_VECTOR as VECTOR_SHUFFLE and back again. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D97698	2021-03-02 16:48:38 +00:00
Alexey Bataev	5fcbc5f0f5	[Instcombine][NFC]Simplify logical reductions tests, NFC.	2021-03-02 08:27:42 -08:00
dfukalov	a698d3528e	[AA] Cache (optionally) estimated PartialAlias offsets. For the cases of two clobbering loads and one loaded object is fully contained in the second `BasicAAResult::aliasGEP` returns just `PartialAlias` that is actually more common case of partial overlap, it doesn't say anything about actual overlapping sizes. AA users such as GVN and DSE have no functionality to estimate aliasing of GEPs with non-constant offsets. The change stores estimated relative offsets so they can be used further. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D93529	2021-03-02 19:04:15 +03:00
Nico Weber	bdc0a8276c	[gn build] (manually) port 99a6d003edbe	2021-03-02 10:31:59 -05:00
Nico Weber	f0eda1a2cd	[gn build] Port f47ff8cff1ed	2021-03-02 10:31:59 -05:00
Nico Weber	84bbbf8474	[gn build] Port ef2389235c5d	2021-03-02 10:31:59 -05:00
Tim Northover	3073fe6437	AArch64: report fp16 arithmetic is present for apple-a11 CPU. AArch64.td got it right, but the target-parser dropped it, leading to missing feature flags in Clang.	2021-03-02 15:07:18 +00:00
Simon Pilgrim	a8e2fbeea0	[DSE] eliminateDeadStoresMemorySSA - fix "initialization is never read" clang-tidy warning. NFCI.	2021-03-02 15:01:33 +00:00
Alexey Bataev	48b45a5713	[SLP]Merge reorder and reuse shuffles. It is possible to merge reuse and reorder shuffles and reduce the total cost of the vectorization tree/number of final instructions. Differential Revision: https://reviews.llvm.org/D94992	2021-03-02 06:39:47 -08:00
Stefan Gränitz	0d86034335	[Orc] Fix MSVC error: conversion from 'initializer list' requires a narrowing	2021-03-02 15:34:36 +01:00
Sanjay Patel	b5d00c90ab	[SDAG] allow vector types for select->logic folds This prepares codegen for a change that will remove the identical folds from IR because they are not poison-safe. See D93065 / D97360 for details. We already generically support scalar types, and there are various target-specific transforms that overlap the vector folds. For example, x86 recognizes the and patterns, but not or. We can end up with 1 extra instruction there, but I think that is still preferred over the blendv alternative that loads a constant vector. If this is not optimal, then it should be fixed with a later transform (this change is not expected to result in any regressions because InstCombine currently does the same thing). Removing custom code and supporting undefs in constant-pattern-matching can be follow-up changes. Differential Revision: https://reviews.llvm.org/D97730	2021-03-02 09:25:10 -05:00
Stefan Gränitz	fb33a58b19	[Orc] Fix remaining memory size of slab allocator	2021-03-02 15:07:37 +01:00
Stefan Gränitz	e9a5668ec5	[docs][JITLink] Fix a typo (NFC)	2021-03-02 15:07:36 +01:00
Stefan Gränitz	4d74454f3a	[Orc] Extend lli debug support tests to JITLink	2021-03-02 15:07:36 +01:00
Stefan Gränitz	7dea8abe52	[lli] Add JITLink in-process debug support lli aims to provide both, RuntimeDyld and JITLink, as the dynamic linkers/loaders for it's JIT implementations. And they both offer debugging via the GDB JIT interface, which builds on the two well-known symbol names `__jit_debug_descriptor` and `__jit_debug_register_code`. As these symbols must be unique accross the linked executable, we can only define them in one of the libraries and make the other depend on it. OrcTargetProcess is a minimal stub for embedding a JIT client in remote executors. For the moment it seems reasonable to have the definition there and let ExecutionEngine depend on it, until we find a better solution. This is the second commit for the reviewed patch. Reviewed By: lhames Differential Revision: https://reviews.llvm.org/D97339	2021-03-02 15:07:36 +01:00
Stefan Gränitz	44d6e1a3fb	[Orc] Add JITLink debug support plugin for ELF x86-64 Add a new ObjectLinkingLayer plugin `DebugObjectManagerPlugin` and infrastructure to handle creation of `DebugObject`s as well as their registration in OrcTargetProcess. The current implementation only covers ELF on x86-64, but the infrastructure is not limited to that. The journey starts with a new `LinkGraph` / `JITLinkContext` pair being created for a `MaterializationResponsibility` in ORC's `ObjectLinkingLayer`. It sends a `notifyMaterializing()` notification, which is forwarded to all registered plugins. The `DebugObjectManagerPlugin` aims to create a `DebugObject` form the provided target triple and object buffer. (Future implementations might create `DebugObject`s from a `LinkGraph` in other ways.) On success it will track it as the pending `DebugObject` for the `MaterializationResponsibility`. This patch only implements the `ELFDebugObject` for `x86-64` targets. It follows the RuntimeDyld approach for debug object setup: it captures a copy of the input object, parses all section headers and prepares to patch their load-address fields with their final addresses in target memory. It instructs the plugin to report the section load-addresses once they are available. The plugin overrides `modifyPassConfig()` and installs a JITLink post-allocation pass to capture them. Once JITLink emitted the finalized executable, the plugin emits and registers the `DebugObject`. For emission it requests a new `JITLinkMemoryManager::Allocation` with a single read-only segment, copies the object with patched section load-addresses over to working memory and triggers finalization to target memory. For registration, it notifies the `DebugObjectRegistrar` provided in the constructor and stores the previously pending`DebugObject` as registered for the corresponding MaterializationResponsibility. The `DebugObjectRegistrar` registers the `DebugObject` with the target process. `llvm-jitlink` uses the `TPCDebugObjectRegistrar`, which calls `llvm_orc_registerJITLoaderGDBWrapper()` in the target process via `TargetProcessControl` to emit a `jit_code_entry` compatible with the GDB JIT interface [1]. So far the implementation only supports registration and no removal. It appears to me that it wouldn't raise any new design questions, so I left this as an addition for the near future. [1] https://sourceware.org/gdb/current/onlinedocs/gdb/JIT-Interface.html Reviewed By: lhames Differential Revision: https://reviews.llvm.org/D97335	2021-03-02 15:07:35 +01:00
Stefan Gränitz	0e6bd8e63e	[Orc] Rename local variable to avoid confusion with equally-named class member (NFC)	2021-03-02 15:07:35 +01:00
Stefan Gränitz	511a7fc0e2	[Orc] Fix a file header (NFC)	2021-03-02 15:07:34 +01:00
Stefan Gränitz	e28df8cd15	[JITLink] LinkGraph::getName() can be const	2021-03-02 15:07:34 +01:00
Stefan Gränitz	7f7a8dc368	[JITLink] Remove some std::move(MemoryBufferRef) below createLinkGraphFromObject() (NFC)	2021-03-02 15:07:34 +01:00
Stefan Gränitz	c364052a21	[llvm-jitlink] Remove duplicate type defintion (NFC)	2021-03-02 15:07:33 +01:00
Stefan Gränitz	0621b0f242	[lli] Add --jit-linker command line argument The argument value determines the dynamic linker to use (`default`, `rtdyld` or `jitlink`). The JITLink implementation only supports in-process JITing for now. This is the first commit for the reviewed patch. Reviewed By: lhames Differential Revision: https://reviews.llvm.org/D97339	2021-03-02 15:07:33 +01:00
Simon Pilgrim	0d25232cae	[DAG] DAGCombiner::tryStoreMergeOfLoads - remove unused StartAddress variable. NFCI. Noticed in "initialization is never read" clang-tidy warning - the only StartAddress set/used is inside the load combine loop.	2021-03-02 13:29:31 +00:00
Benjamin Kramer	d61c913130	[MCParser] Bring back srcmanager diagnostics in AsmParser AsmParser may have no LLVMContext attached to it, which means after 5de2d189e6ad466a1f0616195e8c524a4eb3cbc0 everything goes to stderr. Restore the old behavior.	2021-03-02 13:43:03 +01:00
Simon Pilgrim	54a3086783	[AMDGPU] Fix "initialization is never read" clang-tidy warnings. NFCI.	2021-03-02 12:06:24 +00:00
Fraser Cormack	d1351b0b35	[RISCV] Lower CONCAT_VECTORS to INSERT_SUBVECTOR nodes The default expansion of CONCAT_VECTORS goes through the stack. This patch avoids that penalty by custom-lowering CONCAT_VECTORS to a series of INSERT_SUBVECTOR nodes. Futher optimizations are possible, but this is a good start. Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D97692	2021-03-02 11:13:59 +00:00
Jan Svoboda	3c9c0a88f1	[clang][cli] NFC: Rename marshalling multiclass The new name drops `String` from `MarshallingInfoStringInt`, which follows the naming convention of other marshalling multiclasses.	2021-03-02 11:53:40 +01:00
Florian Hahn	f586accda1	[LV] Add test cases that require a larger number of RT checks. Precommit tests cases for D75981.	2021-03-02 10:49:38 +00:00
Benjamin Kramer	2e5b7002dd	Revert "[X86] Fold shuffle(not(x),undef) -> not(shuffle(x,undef))" This reverts commit 925093d88ae74560a8e94cf66f95d60ea3ffa2d3. Causes an infinite loop when compiling some shuffles: $ cat bugpoint-reduced-simplified.ll target triple = "x86_64-unknown-linux-gnu" define void @foo() { entry: %0 = load i8, i8* undef, align 1 %broadcast.splatinsert = insertelement <16 x i8> poison, i8 %0, i32 0 %1 = icmp ne <16 x i8> %broadcast.splatinsert, zeroinitializer %2 = shufflevector <16 x i1> %1, <16 x i1> undef, <16 x i32> zeroinitializer %wide.load = load <16 x i8>, <16 x i8>* undef, align 1 %3 = icmp ne <16 x i8> %wide.load, zeroinitializer %4 = and <16 x i1> %3, %2 %5 = zext <16 x i1> %4 to <16 x i8> store <16 x i8> %5, <16 x i8>* undef, align 1 ret void } $ llc < bugpoint-reduced-simplified.ll <timeout>	2021-03-02 11:24:07 +01:00
Dmitry Preobrazhensky	b5019ff92f	[AMDGPU][MC][GFX9+] Corrected encoding of op_sel_hi for unused operands in VOP3P Corrected encoding of VOP3P op_sel_hi for unused operands. See bug 49363. Differential Revision: https://reviews.llvm.org/D97689	2021-03-02 13:02:25 +03:00
Stefan Gränitz	d3d5045b8b	[lli] Test debug support in RuntimeDyld with built-in functions When lli runs the below IR, it emits in-memory debug objects and registers them with the GDB JIT interface. The tests dump and check the registered information. IR has limited ability to produce complex output in a portable way. Instead the tests rely on built-in functions implemented in lli. They use a new command line flag `-generate=function-name` to instruct the ORC JIT to expose the built-in function with the given name to the JITed program. `debug-descriptor-elf-minimal.ll` calls `__dump_jit_debug_descriptor()` to reflect the list of debug entries issued for itself after emitting the main module. The output is textual and can be checked straight away. `debug-objects-elf-minimal.ll` calls `__dump_jit_debug_objects()`, which instructs lli to walk through the list of debug entries and append the encountered in-memory objects to the program output. We feed this output into llvm-dwarfdump to parse the DWARF in each file and dump their structures. We can do the same for JITLink once D97335 has landed. Reviewed By: lhames Differential Revision: https://reviews.llvm.org/D97694	2021-03-02 10:39:09 +01:00
Juneyoung Lee	8129c4f928	[JumpThreading] Fix tryToUnfoldSelectInCurrBB to treat and/or and its select form equally This is a minor fix to update tryToUnfoldSelectInCurrBB to ignore select form of and/ors because the function does not look into binops as well	2021-03-02 18:35:18 +09:00
Benjamin Kramer	fd82eec395	[AArch64] Mark test depending on -debug as requiring asserts	2021-03-02 10:28:22 +01:00
David Green	c33ab57f7c	[ARM] Add handling of t2LDRSB/t2LDRSH in Constant Island Pass These constant pool loads should be treated similarly to t2LDRB/t2LDRH, acting on the same offset ranges. Add handling and a simple test.	2021-03-02 08:46:07 +00:00
Kazu Hirata	99066de18c	[IR] Use range-based for loops (NFC)	2021-03-01 23:40:33 -08:00
Kazu Hirata	e503e87089	[readobj] Use ListSeparator (NFC)	2021-03-01 23:40:31 -08:00

1 2 3 4 5 ...

211992 Commits