1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-22 18:54:02 +01:00
Commit Graph

211992 Commits

Author SHA1 Message Date
Markus Böck
20787e83b0 [PR48898][CMake] Support MinGW Toolchain tool sin llvm_ExternalProject_Add
Windows is in the unique position of having two drivers, clang-cl and normal GNU clang, depending on whether a GNU or MSVC target is used. The current implementation with the USE_TOOLCHAIN argument assumes that when CMAKE_SYSTEM_NAME is set to Windows that clang-cl should be used, which is the incorrect choice when targeting a GNU environment.

This patch solves this problem by adding an optional TARGET_TRIPLE argument to llvm_ExternalProject_Add, which sets the various CMAKE_<LANG>_COMPILER_TARGET variables. Additionally, if the triple is detected as an MSVC environment, clang-cl and similar MSVC specific tools will be used instead of the GNU tools.
2021-03-02 22:45:05 +01:00
Heejin Ahn
af019c3b48 [WebAssembly] Fix more ExceptionInfo grouping bugs
This fixes two bugs in `WebAssemblyExceptionInfo` grouping, created by
D97247. These two bugs are not easy to split into two different CLs,
because tests that fail for one also tend to fail for the other.

- In D97247, when fixing `ExceptionInfo` grouping by taking out
  the unwind destination' exception from the unwind src's exception, we
  just iterated the BBs in the function order, but this was incorrect;
  this changes it to dominator tree preorder. Please refer to the
  comments in the code for the reason and an example.

- After this subexception-taking-out fix, there still can be remaining
  BBs we have to take out. When Exception B is taken out of Exception A
  (because EHPad B is the unwind destination of EHPad A), there can
  still be BBs within Exception A that are reachable from Exception B,
  which also should be taken out. Please refer to the comments in the
  code for more detailed explanation on why this can happen. To make
  this possible, this splits `WebAssemblyException::addBlock` into two
  parts: adding to a set and adding to a vector. We need to iterate on
  BBs within a `WebAssemblyException` to fix this, so we add BBs to sets
  first. But we add BBs to vectors later after we fix all incorrectness
  because deleting BBs from vectors is expensive. I considered removing
  the vector from `WebAssemblyException`, but it was not easy because
  this class has to maintain a similar interface with `MachineLoop` to
  be wrapped into a single interface `SortRegion`, which is used in
  CFGSort.

Other misc. drive-by fixes:
- Make `WebAssemblyExceptionInfo` do not even run when wasm EH is not
  used or the function doesn't have any EH pads, not to waste time
- Add `LLVM_DEBUG` lines for easy debugging
- Fix `preds` comments in cfg-stackify-eh.ll
- Fix `__cxa_throw`'s signature in cfg-stackify-eh.ll

Fixes https://github.com/emscripten-core/emscripten/issues/13554.

Reviewed By: dschuff, tlively

Differential Revision: https://reviews.llvm.org/D97677
2021-03-02 13:44:09 -08:00
Nikita Popov
888a67345b [LICM] Make promotion faster
Even when MemorySSA-based LICM is used, an AST is still populated
for scalar promotion. As the AST has quadratic complexity, a lot
of time is spent in this step despite the existing access count
limit. This patch optimizes the identification of promotable stores.

The idea here is pretty simple: We're only interested in must-alias
mod sets of loop invariant pointers. As such, only populate the AST
with loop-invariant loads and stores (anything else is definitely
not promotable) and then discard any sets which alias with any of
the remaining, definitely non-promotable accesses.

If we promoted something, check whether this has made some other
accesses loop invariant and thus possible promotion candidates.

This is much faster in practice, because we need to perform AA
queries for O(NumPromotable^2 + NumPromotable*NumNonPromotable)
instead of O(NumTotal^2), and NumPromotable tends to be small.
Additionally, promotable accesses have loop invariant pointers,
for which AA is cheaper.

This has a signicant positive compile-time impact. We save ~1.8%
geomean on CTMark at O3, with 6% on lencod in particular and 25%
on individual files.

Conceptually, this change is NFC, but may not be so in practice,
because the AST is only an approximation, and can produce
different results depending on the order in which accesses are
added. However, there is at least no impact on the number of promotions
(licm.NumPromoted) in test-suite O3 configuration with this change.

Differential Revision: https://reviews.llvm.org/D89264
2021-03-02 22:10:48 +01:00
Yonghong Song
4e4cb187f5 BPF: Fix a bug in peephole TRUNC elimination optimization
Andrei Matei reported a llvm11 core dump for his bpf program
   https://bugs.llvm.org/show_bug.cgi?id=48578
The core dump happens in LiveVariables analysis phase.
  #4 0x00007fce54356bb0 __restore_rt
  #5 0x00007fce4d51785e llvm::LiveVariables::HandleVirtRegUse(unsigned int,
      llvm::MachineBasicBlock*, llvm::MachineInstr&)
  #6 0x00007fce4d519abe llvm::LiveVariables::runOnInstr(llvm::MachineInstr&,
      llvm::SmallVectorImpl<unsigned int>&)
  #7 0x00007fce4d519ec6 llvm::LiveVariables::runOnBlock(llvm::MachineBasicBlock*, unsigned int)
  #8 0x00007fce4d51a4bf llvm::LiveVariables::runOnMachineFunction(llvm::MachineFunction&)
The bug can be reproduced with llvm12 and latest trunk as well.

Futher analysis shows that there is a bug in BPF peephole
TRUNC elimination optimization, which tries to remove
unnecessary TRUNC operations (a <<= 32; a >>= 32).
Specifically, the compiler did wrong transformation for the
following patterns:
   %1 = LDW ...
   %2 = SLL_ri %1, 32
   %3 = SRL_ri %2, 32
   ... %3 ...
   %4 = SRA_ri %2, 32
   ... %4 ...

The current transformation did not check how many uses of %2
and did transformation like
   %1 = LDW ...
   ... %1 ...
   %4 = SRL_ri %2, 32
   ... %4 ...
and pseudo register %2 is used by not defined and
caused LiveVariables analysis core dump.

To fix the issue, when traversing back from SRL_ri to SLL_ri,
check to ensure SLL_ri has only one use. Otherwise, don't
do transformation.

Differential Revision: https://reviews.llvm.org/D97792
2021-03-02 13:03:42 -08:00
Amara Emerson
17dd5ced91 [AArch64][GlobalISel] Enable use of the optsize predicate in the selector.
To do this while supporting the existing functionality in SelectionDAG of using
PGO info, we add the ProfileSummaryInfo and LazyBlockFrequencyInfo analysis
dependencies to the instruction selector pass.

Then, use the predicate to generate constant pool loads for f32 materialization,
if we're targeting optsize/minsize.

Differential Revision: https://reviews.llvm.org/D97732
2021-03-02 12:55:51 -08:00
Stefan Gränitz
ae81b5754a [llvm-jitlink] Prevent missing symbols from JITLoaderGDB with MSVC mangling
The issue came up on builder clang-x64-windows-msvc after 5182a7901a5d83dfd15021d01e8a1899910130ec
2021-03-02 21:44:54 +01:00
Sanjay Patel
0f5d46fc15 [SDAG] allow partial undef vector constants with select->logic folds
This is an enhancement suggested in the original review/commit:
D97730 / 7fce3322a283
2021-03-02 14:29:15 -05:00
Sanjay Patel
deff241ffa [AArch64] add select tests with partial vector undefs; NFC 2021-03-02 14:29:15 -05:00
David Green
e8b666c3d9 [ARM] Use 0, not ZR during ISel for CSINC/INV/NEG
Instead of converting the 0 into a ZR reg during lowering, do that with
tablegen by matching the zero immediate. This when combined with other
optimizations is more likely to use ZR and helps keep the DAG more
easily optimizable. It should not otherwise effect code generation.
2021-03-02 19:01:14 +00:00
Jonas Paulsson
1ba442ae80 [SystemZ] Assign the full space for promoted and split outgoing args.
When a large "irregular" (e.g. i96) integer call argument is converted to
indirect, 64-bit parts are stored to the stack. The full stack space
(e.g. i128) was not allocated prior to this patch, but rather just the exact
space of the original type. This caused neighboring values on the stack to be
overwritten.

Thanks to Josh Stone for reporting this.

Review: Ulrich Weigand
Fixes https://bugs.llvm.org/show_bug.cgi?id=49322
Differential Revision: https://reviews.llvm.org/D97514
2021-03-02 12:56:47 -06:00
Nico Weber
0cde944f58 [gn build] fix llvm-jitlink tests on linux after ef2389235c5dec0 2021-03-02 13:41:09 -05:00
Joe Nash
f210e9dbe7 [AMDGPU] Make OMod explicit for V_CVT_{U,I}*
Make OMod explicit instead of implied by HasModifiers in the
operand list. Requires explicitly setting HasOMod=1 for
irregular OMod usage in instruction V_CVT_{U,I}*

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D97587

Change-Id: I230e1476f529e816eec60e242531f23a99e3839f
2021-03-02 13:32:06 -05:00
Vy Nguyen
7abc52c73c [lld-macho] Change loadReexport to handle the case where a TAPI re-exports to reference documents nested within other TBD.
Currently, it was delibrately impleneted to not handle this case, but as it has turnt out, we need this feature.
The concrete use case is
       `System/Library/Frameworks/Cocoa.framework/Versions/A/Cocoa` reexports
               /System/Library/Frameworks/AppKit.framework/Versions/C/AppKit , which then rexports
                    /System/Library/PrivateFrameworks/UIFoundation.framework/Versions/A/UIFoundation

The current implemention uses a global currentTopLevelTapi, which is not reset until it finishes loading the whole tree.
This is a problem because if the top-level is set to Cocoa, then when we get to UIFoundation, it will try to find UIFoundation in the current top level, which is Cocoa and will not find it.

The right thing should be:
 - When loading a library from a TBD file, re-exports need to be looked up in the auxiliary documents within the same TBD.
 - When loading from an actual dylib, no additional TBD documents need to be examined.
 - In no case does a re-export mentioned in one TBD file need to be looked up in a document in an auxiliary document from a different TBD file

Differential Revision: https://reviews.llvm.org/D97438
2021-03-02 12:14:31 -05:00
Krzysztof Parzyszek
1e0e652e78 [TableGen] Add IntrNoMerge as intrinsic property
There is a function attribute 'nomerge' in addition to 'noduplicate'
and 'convergent'. Both 'noduplicate' and 'convergent' have corresponding
intrinsic properties. This patch adds an intrinsic property for the
'nomerge' attribute.

Differential Revision: https://reviews.llvm.org/D96364
2021-03-02 09:04:50 -08:00
Fraser Cormack
97dc903cfa [RISCV] Support fixed-length INSERT_VECTOR_ELT
This patch enables support for lowering INSERT_VECTOR_ELT on
fixed-length vector types. The strategy follows that for scalable vector
types.

This patch also includes a quick fix to prevent the compiler infinitely
looping between lowering BUILD_VECTOR as VECTOR_SHUFFLE and back again.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D97698
2021-03-02 16:48:38 +00:00
Alexey Bataev
5fcbc5f0f5 [Instcombine][NFC]Simplify logical reductions tests, NFC. 2021-03-02 08:27:42 -08:00
dfukalov
a698d3528e [AA] Cache (optionally) estimated PartialAlias offsets.
For the cases of two clobbering loads and one loaded object is fully contained
in the second `BasicAAResult::aliasGEP` returns just `PartialAlias` that
is actually more common case of partial overlap, it doesn't say anything about
actual overlapping sizes.

AA users such as GVN and DSE have no functionality to estimate aliasing of GEPs
with non-constant offsets. The change stores estimated relative offsets so they
can be used further.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D93529
2021-03-02 19:04:15 +03:00
Nico Weber
bdc0a8276c [gn build] (manually) port 99a6d003edbe 2021-03-02 10:31:59 -05:00
Nico Weber
f0eda1a2cd [gn build] Port f47ff8cff1ed 2021-03-02 10:31:59 -05:00
Nico Weber
84bbbf8474 [gn build] Port ef2389235c5d 2021-03-02 10:31:59 -05:00
Tim Northover
3073fe6437 AArch64: report fp16 arithmetic is present for apple-a11 CPU.
AArch64.td got it right, but the target-parser dropped it, leading to missing
feature flags in Clang.
2021-03-02 15:07:18 +00:00
Simon Pilgrim
a8e2fbeea0 [DSE] eliminateDeadStoresMemorySSA - fix "initialization is never read" clang-tidy warning. NFCI. 2021-03-02 15:01:33 +00:00
Alexey Bataev
48b45a5713 [SLP]Merge reorder and reuse shuffles.
It is possible to merge reuse and reorder shuffles and reduce the total
cost of the vectorization tree/number of final instructions.

Differential Revision: https://reviews.llvm.org/D94992
2021-03-02 06:39:47 -08:00
Stefan Gränitz
0d86034335 [Orc] Fix MSVC error: conversion from 'initializer list' requires a narrowing 2021-03-02 15:34:36 +01:00
Sanjay Patel
b5d00c90ab [SDAG] allow vector types for select->logic folds
This prepares codegen for a change that will remove the identical
folds from IR because they are not poison-safe. See
D93065 / D97360
for details.

We already generically support scalar types, and there are various
target-specific transforms that overlap the vector folds. For example,
x86 recognizes the and patterns, but not or. We can end up with 1
extra instruction there, but I think that is still preferred over the
blendv alternative that loads a constant vector.

If this is not optimal, then it should be fixed with a later transform
(this change is not expected to result in any regressions because
InstCombine currently does the same thing).

Removing custom code and supporting undefs in constant-pattern-matching
can be follow-up changes.

Differential Revision: https://reviews.llvm.org/D97730
2021-03-02 09:25:10 -05:00
Stefan Gränitz
fb33a58b19 [Orc] Fix remaining memory size of slab allocator 2021-03-02 15:07:37 +01:00
Stefan Gränitz
e9a5668ec5 [docs][JITLink] Fix a typo (NFC) 2021-03-02 15:07:36 +01:00
Stefan Gränitz
4d74454f3a [Orc] Extend lli debug support tests to JITLink 2021-03-02 15:07:36 +01:00
Stefan Gränitz
7dea8abe52 [lli] Add JITLink in-process debug support
lli aims to provide both, RuntimeDyld and JITLink, as the dynamic linkers/loaders for it's JIT implementations. And they both offer debugging via the GDB JIT interface, which builds on the two well-known symbol names `__jit_debug_descriptor` and `__jit_debug_register_code`. As these symbols must be unique accross the linked executable, we can only define them in one of the libraries and make the other depend on it. OrcTargetProcess is a minimal stub for embedding a JIT client in remote executors. For the moment it seems reasonable to have the definition there and let ExecutionEngine depend on it, until we find a better solution.

This is the second commit for the reviewed patch.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D97339
2021-03-02 15:07:36 +01:00
Stefan Gränitz
44d6e1a3fb [Orc] Add JITLink debug support plugin for ELF x86-64
Add a new ObjectLinkingLayer plugin `DebugObjectManagerPlugin` and infrastructure to handle creation of `DebugObject`s as well as their registration in OrcTargetProcess. The current implementation only covers ELF on x86-64, but the infrastructure is not limited to that.

The journey starts with a new `LinkGraph` / `JITLinkContext` pair being created for a `MaterializationResponsibility` in ORC's `ObjectLinkingLayer`. It sends a `notifyMaterializing()` notification, which is forwarded to all registered plugins. The `DebugObjectManagerPlugin` aims to create a  `DebugObject` form the provided target triple and object buffer. (Future implementations might create `DebugObject`s from a `LinkGraph` in other ways.) On success it will track it as the pending `DebugObject` for the `MaterializationResponsibility`.

This patch only implements the `ELFDebugObject` for `x86-64` targets. It follows the RuntimeDyld approach for debug object setup: it captures a copy of the input object, parses all section headers and prepares to patch their load-address fields with their final addresses in target memory. It instructs the plugin to report the section load-addresses once they are available. The plugin overrides `modifyPassConfig()` and installs a JITLink post-allocation pass to capture them.

Once JITLink emitted the finalized executable, the plugin emits and registers the `DebugObject`. For emission it requests a new `JITLinkMemoryManager::Allocation` with a single read-only segment, copies the object with patched section load-addresses over to working memory and triggers finalization to target memory. For registration, it notifies the `DebugObjectRegistrar` provided in the constructor and stores the previously pending`DebugObject` as registered for the corresponding MaterializationResponsibility.

The `DebugObjectRegistrar` registers the `DebugObject` with the target process. `llvm-jitlink` uses the `TPCDebugObjectRegistrar`, which calls `llvm_orc_registerJITLoaderGDBWrapper()` in the target process via `TargetProcessControl` to emit a `jit_code_entry` compatible with the GDB JIT interface [1]. So far the implementation only supports registration and no removal. It appears to me that it wouldn't raise any new design questions, so I left this as an addition for the near future.

[1] https://sourceware.org/gdb/current/onlinedocs/gdb/JIT-Interface.html

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D97335
2021-03-02 15:07:35 +01:00
Stefan Gränitz
0e6bd8e63e [Orc] Rename local variable to avoid confusion with equally-named class member (NFC) 2021-03-02 15:07:35 +01:00
Stefan Gränitz
511a7fc0e2 [Orc] Fix a file header (NFC) 2021-03-02 15:07:34 +01:00
Stefan Gränitz
e28df8cd15 [JITLink] LinkGraph::getName() can be const 2021-03-02 15:07:34 +01:00
Stefan Gränitz
7f7a8dc368 [JITLink] Remove some std::move(MemoryBufferRef) below createLinkGraphFromObject() (NFC) 2021-03-02 15:07:34 +01:00
Stefan Gränitz
c364052a21 [llvm-jitlink] Remove duplicate type defintion (NFC) 2021-03-02 15:07:33 +01:00
Stefan Gränitz
0621b0f242 [lli] Add --jit-linker command line argument
The argument value determines the dynamic linker to use (`default`, `rtdyld` or `jitlink`). The JITLink implementation only supports in-process JITing for now. This is the first commit for the reviewed patch.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D97339
2021-03-02 15:07:33 +01:00
Simon Pilgrim
0d25232cae [DAG] DAGCombiner::tryStoreMergeOfLoads - remove unused StartAddress variable. NFCI.
Noticed in "initialization is never read" clang-tidy warning - the only StartAddress set/used is inside the load combine loop.
2021-03-02 13:29:31 +00:00
Benjamin Kramer
d61c913130 [MCParser] Bring back srcmanager diagnostics in AsmParser
AsmParser may have no LLVMContext attached to it, which means after
5de2d189e6ad466a1f0616195e8c524a4eb3cbc0 everything goes to stderr.
Restore the old behavior.
2021-03-02 13:43:03 +01:00
Simon Pilgrim
54a3086783 [AMDGPU] Fix "initialization is never read" clang-tidy warnings. NFCI. 2021-03-02 12:06:24 +00:00
Fraser Cormack
d1351b0b35 [RISCV] Lower CONCAT_VECTORS to INSERT_SUBVECTOR nodes
The default expansion of CONCAT_VECTORS goes through the stack. This
patch avoids that penalty by custom-lowering CONCAT_VECTORS to a series
of INSERT_SUBVECTOR nodes. Futher optimizations are possible, but this
is a good start.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D97692
2021-03-02 11:13:59 +00:00
Jan Svoboda
3c9c0a88f1 [clang][cli] NFC: Rename marshalling multiclass
The new name drops `String` from `MarshallingInfoStringInt`, which follows the naming convention of other marshalling multiclasses.
2021-03-02 11:53:40 +01:00
Florian Hahn
f586accda1 [LV] Add test cases that require a larger number of RT checks.
Precommit tests cases for D75981.
2021-03-02 10:49:38 +00:00
Benjamin Kramer
2e5b7002dd Revert "[X86] Fold shuffle(not(x),undef) -> not(shuffle(x,undef))"
This reverts commit 925093d88ae74560a8e94cf66f95d60ea3ffa2d3.

Causes an infinite loop when compiling some shuffles:

$ cat bugpoint-reduced-simplified.ll
target triple = "x86_64-unknown-linux-gnu"

define void @foo() {
entry:
  %0 = load i8, i8* undef, align 1
  %broadcast.splatinsert = insertelement <16 x i8> poison, i8 %0, i32 0
  %1 = icmp ne <16 x i8> %broadcast.splatinsert, zeroinitializer
  %2 = shufflevector <16 x i1> %1, <16 x i1> undef, <16 x i32> zeroinitializer
  %wide.load = load <16 x i8>, <16 x i8>* undef, align 1
  %3 = icmp ne <16 x i8> %wide.load, zeroinitializer
  %4 = and <16 x i1> %3, %2
  %5 = zext <16 x i1> %4 to <16 x i8>
  store <16 x i8> %5, <16 x i8>* undef, align 1
  ret void
}

$ llc < bugpoint-reduced-simplified.ll
<timeout>
2021-03-02 11:24:07 +01:00
Dmitry Preobrazhensky
b5019ff92f [AMDGPU][MC][GFX9+] Corrected encoding of op_sel_hi for unused operands in VOP3P
Corrected encoding of VOP3P op_sel_hi for unused operands. See bug 49363.

Differential Revision: https://reviews.llvm.org/D97689
2021-03-02 13:02:25 +03:00
Stefan Gränitz
d3d5045b8b [lli] Test debug support in RuntimeDyld with built-in functions
When lli runs the below IR, it emits in-memory debug objects and registers them with the GDB JIT interface. The tests dump and check the registered information. IR has limited ability to produce complex output in a portable way. Instead the tests rely on built-in functions implemented in lli. They use a new command line flag `-generate=function-name` to instruct the ORC JIT to expose the built-in function with the given name to the JITed program.

`debug-descriptor-elf-minimal.ll` calls `__dump_jit_debug_descriptor()` to reflect the list of debug entries issued for itself after emitting the main module. The output is textual and can be checked straight away.

`debug-objects-elf-minimal.ll` calls `__dump_jit_debug_objects()`, which instructs lli to walk through the list of debug entries and append the encountered in-memory objects to the program output. We feed this output into llvm-dwarfdump to parse the DWARF in each file and dump their structures.

We can do the same for JITLink once D97335 has landed.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D97694
2021-03-02 10:39:09 +01:00
Juneyoung Lee
8129c4f928 [JumpThreading] Fix tryToUnfoldSelectInCurrBB to treat and/or and its select form equally
This is a minor fix to update tryToUnfoldSelectInCurrBB to ignore select
form of and/ors because the function does not look into binops as well
2021-03-02 18:35:18 +09:00
Benjamin Kramer
fd82eec395 [AArch64] Mark test depending on -debug as requiring asserts 2021-03-02 10:28:22 +01:00
David Green
c33ab57f7c [ARM] Add handling of t2LDRSB/t2LDRSH in Constant Island Pass
These constant pool loads should be treated similarly to t2LDRB/t2LDRH,
acting on the same offset ranges. Add handling and a simple test.
2021-03-02 08:46:07 +00:00
Kazu Hirata
99066de18c [IR] Use range-based for loops (NFC) 2021-03-01 23:40:33 -08:00
Kazu Hirata
e503e87089 [readobj] Use ListSeparator (NFC) 2021-03-01 23:40:31 -08:00