This way, once there's an error in the snippet file (like in the test),
llvm-exegesis won't crash with an assertion failure,
but print a nice diagnostic about the problem.
Define -fatal-warnings to make warnings fatal, and accept /WX as an ML.EXE compatible alias for it.
Also make sure that if Warning() returns true, we always treat it as an error.
Reviewed By: thakis
Differential Revision: https://reviews.llvm.org/D92504
This makes it possible to build libLLVM.so without first creating a
static library for each component. In the case where only libLLVM.so is
built (i.e. ninja LLVM) this eliminates 150 linker jobs.
Reviewed By: stellaraccident
Differential Revision: https://reviews.llvm.org/D95727
in this patch we add a new libLTO API to specify debug options independent of an lto_code_gen_t.
This allows clients to pass codegen flags (through libLTO) which otherwise today are ignored.
Reviewed By: steven_wu
Differential Revision: https://reviews.llvm.org/D92611
Adds utilities for creating anonymous pointers and jump stubs to x86_64.h. These
are used by the GOT and Stubs builder, but may also be used by pass writers who
want to create pointer stubs for indirection.
This patch also switches the underlying type for LinkGraph content from
StringRef to ArrayRef<char>. This avoids any confusion when working with buffers
that contain null bytes in the middle like, for example, a newly added null
pointer content array. ;)
Use profiled call edges to augment the top-down order. There are cases that the top-down order computed based on the static call graph doesn't reflect real execution order. For example:
1. Incomplete static call graph due to unknown indirect call targets. Adjusting the order by considering indirect call edges from the profile can enable the inlining of indirect call targets by allowing the caller processed before them.
2. Mutual call edges in an SCC. The static processing order computed for an SCC may not reflect the call contexts in the context-sensitive profile, thus may cause potential inlining to be overlooked. The function order in one SCC is being adjusted to a top-down order based on the profile to favor more inlining.
3. Transitive indirect call edges due to inlining. When a callee function is inlined into into a caller function in LTO prelink, every call edge originated from the callee will be transferred to the caller. If any of the transferred edges is indirect, the original profiled indirect edge, even if considered, would not enforce a top-down order from the caller to the potential indirect call target in LTO postlink since the inlined callee is gone from the static call graph.
4. #3 can happen even for direct call targets, due to functions defined in header files. Header functions, when included into source files, are defined multiple times but only one definition survives due to ODR. Therefore, the LTO prelink inlining done on those dropped definitions can be useless based on a local file scope. More importantly, the inlinee, once fully inlined to a to-be-dropped inliner, will have no profile to consume when its outlined version is compiled. This can lead to a profile-less prelink compilation for the outlined version of the inlinee function which may be called from external modules. while this isn't easy to fix, we rely on the postlink AutoFDO pipeline to optimize the inlinee. Since the survived copy of the inliner (defined in headers) can be inlined in its local scope in prelink, it may not exist in the merged IR in postlink, and we'll need the profiled call edges to enforce a top-down order for the rest of the functions.
Considering those cases, a profiled call graph completely independent of the static call graph is constructed based on profile data, where function objects are not even needed to handle case #3 and case 4.
I'm seeing an average 0.4% perf win out of SPEC2017. For certain benchmark such as Xalanbmk and GCC, the win is bigger, above 2%.
The change is an enhancement to https://reviews.llvm.org/D95988.
Reviewed By: wmi, wenlei
Differential Revision: https://reviews.llvm.org/D99351
This flag allows the developer to see the result of linking even if it fails the verifier, as a step in debugging cases where the linked module fails the verifier.
Differential Revision: https://reviews.llvm.org/D99382
MCJIT served well as the default JIT engine in lli for a long time, but the code is getting old and maintenance efforts don't seem to be in sight. In the meantime Orc became mature enough to fill that gap. The newly added greddy mode is very similar to the execution model of MCJIT. It should work as a drop-in replacement for common JIT tasks.
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D98931
This option tells LLJIT to disable platform support explicitly: JITDylibs aren't scanned for special init/deinit symbols and no runtime API interposes are injected.
It's useful in two cases: for platforms that don't have such requirements and platforms for which we have no explicit support yet and that don't work well with the generic IR platform.
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D99416
Encountered a crash while running a debug build, where this code path would be taken due to a mismatch in profile coverage data versions. Without consuming the error, an assert would be triggered inside the destructor of Error.
Differential Revision: https://reviews.llvm.org/D99457
This change sets up a framework in llvm-profgen to estimate inline decision and adjust context-sensitive profile based on that. We call it a global pre-inliner in llvm-profgen.
It will serve two purposes:
1) Since context profile for not inlined context will be merged into base profile, if we estimate a context will not be inlined, we can merge the context profile in the output to save profile size.
2) For thinLTO, when a context involving functions from different modules is not inined, we can't merge functions profiles across modules, leading to suboptimal post-inline count quality. By estimating some inline decisions, we would be able to adjust/merge context profiles beforehand as a mitigation.
Compiler inline heuristic uses inline cost which is not available in llvm-profgen. But since inline cost is closely related to size, we could get an estimate through function size from debug info. Because the size we have in llvm-profgen is the final size, it could also be more accurate than the inline cost estimation in the compiler.
This change only has the framework, with a few TODOs left for follow up patches for a complete implementation:
1) We need to retrieve size for funciton//inlinee from debug info for inlining estimation. Currently we use number of samples in a profile as place holder for size estimation.
2) Currently the thresholds are using the values used by sample loader inliner. But they need to be tuned since the size here is fully optimized machine code size, instead of inline cost based on not yet fully optimized IR.
Differential Revision: https://reviews.llvm.org/D99146
Using $ breaks demangling of the symbols. For example,
$ c++filt _Z3foov\$123
_Z3foov$123
This causes problems for developers who would like to see nice stack traces
etc., but also for automatic crash tracking systems which try to organize
crashes based on the stack traces.
Instead, use the period as suffix separator, since Itanium demanglers normally
ignore such suffixes:
$ c++filt _Z3foov.123
foo() [clone .123]
This is already done in some places; try to do it everywhere.
Differential revision: https://reviews.llvm.org/D97484
In future patches I will be setting the IsText parameter frequently so I will refactor the args to be in the following order. I have removed the FileSize parameter because it is never used.
```
static ErrorOr<std::unique_ptr<MemoryBuffer>>
getFile(const Twine &Filename, bool IsText = false,
bool RequiresNullTerminator = true, bool IsVolatile = false);
static ErrorOr<std::unique_ptr<MemoryBuffer>>
getFileOrSTDIN(const Twine &Filename, bool IsText = false,
bool RequiresNullTerminator = true);
static ErrorOr<std::unique_ptr<MB>>
getFileAux(const Twine &Filename, uint64_t MapSize, uint64_t Offset,
bool IsText, bool RequiresNullTerminator, bool IsVolatile);
static ErrorOr<std::unique_ptr<WritableMemoryBuffer>>
getFile(const Twine &Filename, bool IsVolatile = false);
```
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D99182
The option `--prefix-strip` is only used when `--prefix` is not empty.
It removes N initial directories from absolute paths before adding the
prefix.
This matches GNU's objdump behavior.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D96679
This is a follow-up for:
D98604 [MCA] Ensure that writes occur in-order
When instructions are aligned by the order of writes, they retire
in-order naturally. There is no need for an RCU, so it is disabled.
Differential Revision: https://reviews.llvm.org/D98628
This patch renames the "Initial" member of WasmLimits to the name used
in the spec, "Minimum".
In the core WebAssembly specification, the Limits data type has one
required "min" member and one optional "max" member, indicating the
minimum required size of the corresponding table or memory, and the
maximum size, if any.
Although the WebAssembly spec does instantiate locally-defined tables
and memories with the initial size being equal to the minimum size, it
can't impose such a requirement for imports. It doesn't make sense to
require an initial size for a memory import, for example. The compiler
can only sensibly express the minimum and maximum sizes.
See
https://github.com/WebAssembly/js-types/blob/master/proposals/js-types/Overview.md#naming-of-size-limits
for a related discussion that agrees that the right name of "initial" is
"minimum" when querying the type of a table or memory from JavaScript.
(Of course it still makes sense for JS to speak in terms of an initial
size when it explicitly instantiates memories and tables.)
Differential Revision: https://reviews.llvm.org/D99186
Exclude AArch64 mapping symbols ($x and $d) for symtab symbolization as
it was done for ARM since D95916 tom bring bots back to green state.
This is implemented by setting SF_FormatSpecific such that
llvm-symbolizer will ignore them, and use this flag to re-implement
llvm-nm --special-syms option which make it work for both targets.
Differential Revision: https://reviews.llvm.org/D98803
MCJIT served well as the default JIT engine in lli for a long time, but the code is getting old and maintenance efforts don't seem to be in sight. In the meantime Orc became mature enough to fill that gap. The newly added greddy mode is very similar to the execution model of MCJIT. It should work as a drop-in replacement for common JIT tasks.
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D98931
This patch adds a fallthrough bit to basic block metadata, indicating whether the basic block can fallthrough without taking any branches. The bit will help us avoid an intel LBR bug which results in occasional duplicate entries at the beginning of the LBR stack.
This patch uses `MachineBasicBlock::canFallThrough()` to set the bit. This is not a const method because it eventually calls `TargetInstrInfo::analyzeBranch`, but it calls this function with the default `AllowModify=false`. So we can either make the argument to the `getBBAddrMapMetadata` non-const, or we can use `const_cast` when calling `canFallThrough`. I decide to go with the latter since this is purely due to legacy code, and in general we should not allow the BasicBlock to be mutable during `getBBAddrMapMetadata`.
Reviewed By: tmsriram
Differential Revision: https://reviews.llvm.org/D96918
Fix spurious warnings for missing symbols with thinLTO. The latter
appends a unique suffix to avoid collisions for exported private
symbols, resulting in dsymutil complaining it couldn't find the symbol
in the object file.
rdar://75434058
Differential revision: https://reviews.llvm.org/D99125
Switch to use cold threshold from profile summary for cold context merging and trimming, instead of relying on hard coded values. Minor refactoring included for switch names, etc.
Differential Revision: https://reviews.llvm.org/D98921
writeToOutput function is useful when it is necessary to create different kinds
of streams(based on stream name) and when we need to use a temporary file
while writing(which would be renamed into the resulting file in a success case).
This patch moves the writeToStream helper into the Support library.
Differential Revision: https://reviews.llvm.org/D98426
Add diagnostic output for TCP connections on both sides, llvm-jitlink and llvm-jitlink-executor.
Port the executor to use getaddrinfo(3) as well. This makes the code more symmetric and seems to be the recommended way for implementing the server side.
Reviewed By: rzurob
Differential Revision: https://reviews.llvm.org/D98581
Since llvm-jitlink moved from gethostbyname to getaddrinfo in D95477, it seems to no longer connect to llvm-jitlink-executor via TCP. I can reproduce this behavior on both, Debian 10 and macOS 10.15.7:
```
> llvm-jitlink-executor listen=localhost:10819
--
> llvm-jitlink --oop-executor-connect=localhost:10819 /path/to/obj.o
Failed to resolve localhost:10819
```
Reviewed By: rzurob
Differential Revision: https://reviews.llvm.org/D98579
This is a similarity visualization tool that accepts a Module and
passes it to the IRSimilarityIdentifier. The resulting SimilarityGroups
are output in a JSON file.
Tests are found in test/tools/llvm-sim and check for the file not found,
a bad module, and that the JSON is created correctly.
Reviewers: paquette, jroelofs, MaskRay
Recommit of: 15645d044bcfe2a0f63156048b302f997a717688 to fix linking
errors.
Differential Revision: https://reviews.llvm.org/D86974
Split from D91844.
The return value of function `ModuleLazyLoaderCache::operator()` in file llvm/tools/llvm-link/llvm-link.cpp. According to the bug report of my static analyzer, the std::function variable `ModuleLazyLoaderCache::createLazyModule` points to function `loadFile`, which may return `nullptr` when error. And the pointer is dereferenced without a check.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D97258
This changes adds attribute field for metadata of context profile. Currently we have an inline attribute that indicates whether the leaf frame corresponding to a context profile was inlined in previous build.
This will be used to help estimating inlining and be taken into account when trimming context. Changes for that in llvm-profgen will follow. It will also help tuning.
Differential Revision: https://reviews.llvm.org/D98823
In the existing OrcLazy mode, modules go through partitioning and outgoing calls are replaced by reexport stubs that resolve on call-through. In greedy mode that this patch unlocks for lli, modules materialize as a whole and trigger materialization for all required symbols recursively. This is useful for testing (e.g. D98785) and it's more similar to the way MCJIT works.
This patch is follow-up for D91028. It implements direct writing into the
output stream for wasm.
Depends on D91028
Differential Revision: https://reviews.llvm.org/D95478
This patch removes creation of the resulting file from the
executeObjcopyOnBinary() function. For the most use cases, the
executeObjcopyOnBinary receives output file as a parameter
- raw_ostream &Out. The splitting .dwo file is implemented differently:
file containg .dwo tables is created inside executeObjcopyOnBinary().
When objcopy functionality would be moved into separate library,
current implementation will become inconvenient. The goal of that
refactoring is to separate concerns: It might be convenient to
to do dwo tables splitting but to create resulting file differently.
Differential Revision: https://reviews.llvm.org/D98582
The D93881 added functionality which preserve ownership for output file
if llvm-objcopy is called under root. That code was added into the place
where output file is created. The llvm-objcopy already has a function which
sets/restores rights/permissions for the output file.
That is the restoreStatOnFile() function. This patch moves code
(preserving ownershipping) into the restoreStatOnFile() function.
Differential Revision: https://reviews.llvm.org/D98511
Previously we didn't support to keep the unique linkage name(-funique-internal-linkage-name) in llvm-profgen. As discussed in https://reviews.llvm.org/D96932, we choose to do canonicalization for it.
Now since "selected" is set as the default parameter of getCanonicalFnName in `D96932`, we don't need to add any attribute here for the previous usage and only fix the missing usage in the pseudo probe decoding.
Differential Revision: https://reviews.llvm.org/D98226
This pass runs in any situations but we skip it when it is not O0 and the
function doesn't have optnone attribute. With -O0, the def of shape to amx
intrinsics is near the amx intrinsics code. We are not able to find a
point which post-dominate all the shape and dominate all amx intrinsics.
To decouple the dependency of the shape, we transform amx intrinsics
to scalar operation, so that compiling doesn't fail. In long term, we
should improve fast register allocation to allocate amx register.
Reviewed By: pengfei
Differential Revision: https://reviews.llvm.org/D93594
For ThinLTO's prelink compilation, we need to put external inline candidates into an import list attached to function's entry count metadata. This enables ThinLink to treat such cross module callee as hot in summary index, and later helps postlink to import them for profile guided cross module inlining.
For AutoFDO, the import list is retrieved by traversing the nested inlinee functions. For CSSPGO, since profile is flatterned, a few things need to happen for it to work:
- When loading input profile in extended binary format, we need to load all child context profile whose parent is in current module, so context trie for current module includes potential cross module inlinee.
- In order to make the above happen, we need to know whether input profile is CSSPGO profile before start reading function profile, hence a flag for profile summary section is added.
- When searching for cross module inline candidate, we need to walk through the context trie instead of nested inlinee profile (callsite sample of AutoFDO profile).
- Now that we have more accurate counts with CSSPGO, we swtiched to use entry count instead of total count to decided if an external callee is potentially beneficial to inline. This make it consistent with how we determine whether call tagert is potential inline candidate.
Differential Revision: https://reviews.llvm.org/D98590
This makes the target triple, graph name, and full graph content available
when making decisions about how to populate the linker pass pipeline.
Also updates the LLJITWithObjectLinkingLayerPlugin example to show more
API use, including use of the API changes in this patch.
During D88827 it was requested to remove the local implementation
of Memory/File Buffers:
// TODO: refactor the buffer classes in LLVM to enable us to use them here
// directly.
This patch uses raw_ostream instead of Buffers. Generally, using streams
could allow us to reduce memory usages. No need to load all data into the
memory - the data could be streamed through a smaller buffer.
Thus, this patch uses raw_ostream as an interface for output data:
Error executeObjcopyOnBinary(CopyConfig &Config,
object::Binary &In,
raw_ostream &Out);
Note 1. This patch does not change the implementation of Writers
so that data would be directly stored into raw_ostream.
This is assumed to be done later.
Note 2. It would be better if Writers would be implemented in a such way
that data could be streamed without seeking/updating. If that would be
inconvenient then raw_ostream could be replaced with raw_pwrite_stream
to have a possibility to seek back and update file headers.
This is assumed to be done later if necessary.
Note 3. Current FileOutputBuffer allows using a memory-mapped file.
The raw_fd_ostream (which could be used if data should be stored in the file)
does not allow us to use a memory-mapped file. Memory map functionality
could be implemented for raw_fd_ostream:
It is possible to add resize() method into raw_ostream.
class raw_ostream {
void resize(uint64_t size);
}
That method, implemented for raw_fd_ostream, could create a memory-mapped file.
The streamed data would be written into that memory file then.
Thus we would be able to use memory-mapped files with raw_fd_ostream.
This is assumed to be done later if necessary.
Differential Revision: https://reviews.llvm.org/D91028
D96109 was recently submitted which contains the refactored implementation of
-funique-internal-linakge-names by adding the unique suffixes in clang rather
than as an LLVM pass. Deleting the former implementation in this change.
Differential Revision: https://reviews.llvm.org/D98234
llvm-jitlink and llvm-jitlink-executor make use of APIs that are
part of the socket and nsl libraries on SunOS systems (Solaris and
Illumos). Make sure they get linked.
Ran into this in Rust CI when cross-compiling LLVM 12 to these
targets.
Differential Revision: https://reviews.llvm.org/D97633
This diff introduces --keep-undefined in llvm-objcopy/llvm-strip for Mach-O
which makes the tools preserve undefined symbols.
Test plan: make check-all
Differential revision: https://reviews.llvm.org/D97040
Context-sensitive AutoFDO profile has a different name scheme where full calling contexts are encoded as function names. When processing CS proifle, llvm-profdata should use full contexts instead of leaf function names.
Reviewed By: wmi, wenlei, wlei
Differential Revision: https://reviews.llvm.org/D97998
[llvm-exegesis] Disable the LBR check on AMD
https://bugs.llvm.org/show_bug.cgi?id=48918
The bug reported a hang (or very very slow runtime) on a Zen2. Unfortunately, we don't have the hardware right now to debug it and I was not able to reproduce the bug on a HSW.
Theory we've got is that the lbr-checking code could be confused on AMD.
Differential Revision: https://reviews.llvm.org/D97504
New change:
- Surround usages of x86 helper in llvm-exegesis/X86/Target.cpp with ifdef
- Fix bug which caused the caller of getVendorSignature to not have a copy of EAX that it expected.
With reference types, tables can have non-zero table numbers. This
commit adds support for element sections against these tables.
Differential Revision: https://reviews.llvm.org/D97923
The code was using the standard isalnum function which doesn't handle
values outside the non-ascii range. Switching to using llvm::isAlnum
instead ensures we don't provoke undefined behaviour, which can in some
cases result in crashes.
Reviewed by: MaskRay
Differential Revision: https://reviews.llvm.org/D97663
This pass runs in any situations but we skip it when it is not O0 and the
function doesn't have optnone attribute. With -O0, the def of shape to amx
intrinsics is near the amx intrinsics code. We are not able to find a
point which post-dominate all the shape and dominate all amx intrinsics.
To decouple the dependency of the shape, we transform amx intrinsics
to scalar operation, so that compiling doesn't fail. In long term, we
should improve fast register allocation to allocate amx register.
Reviewed By: pengfei
Differential Revision: https://reviews.llvm.org/D93594
This change adds '-use-interfacestub' option to allow llvm-ifs
to use InterfaceStub lib when generating ELF binary.
Differential Revision: https://reviews.llvm.org/D94461
https://bugs.llvm.org/show_bug.cgi?id=48918
The bug reported a hang (or very very slow runtime) on a Zen2. Unfortunately, we don't have the hardware right now to debug it and I was not able to reproduce the bug on a HSW.
Theory we've got is that the lbr-checking code could be confused on AMD.
Differential Revision: https://reviews.llvm.org/D97504
This patch adds a pipeline to support in-order CPUs such as ARM
Cortex-A55.
In-order pipeline implements a simplified version of Dispatch,
Scheduler and Execute stages as a single stage. Entry and Retire
stages are common for both in-order and out-of-order pipelines.
Differential Revision: https://reviews.llvm.org/D94928
The help text and documentation for the --discard-all option failed to
mention that the option also causes the removal of debug sections. This
change fixes both for both llvm-objcopy and llvm-strip.
Reviewed by: MaskRay
Differential Revision: https://reviews.llvm.org/D97662
The check for whether an extended symbol index table was required
dropped the first SHN_LORESERVE sections from the sections array before
checking whether the remaining sections had symbols. Unfortunately, the
null section header is not present in this list, so the check was
skipping the first section that might be important. If that section
contained a symbol, and no subsequent ones did, the .symtab_shndx
section would not be emitted, leading to a corrupt object.
Also consolidate and expand test coverage in the area to cover this bug
and other aspects of the SYMTAB_SHNDX section.
Reviewed by: alexshap, MaskRay
Differential Revision: https://reviews.llvm.org/D97661
This reverts commit 900f076113302e26e1939541b546b0075e3e9721 and attempts an actual fix: All failing tests for llvm-jitlink use the `-noexec` flag. The inputs they operate on are not meant for execution on the host system. Looking e.g. at the MachO_test_harness_harnesss.s test, llvm-mc generates input machine code with "x86_64-apple-macosx10.9".
My previous attempt in bbdb4c8c9bcef0e8db751630accc04ad874f54e7 disabled the debug support plugin for Windows targets, but what we would actually want is to disable it on Windows HOSTS.
With the new patch here, I don't do exactly that, but instead follow the approach for the EH frame plugin and include the `-noexec` flag in the condition. It should have the desired effect when it comes to the test suite. It appears a little workaround'ish, but should work reliably for now. I will discuss the issue with Lang and see if we can do better. Thanks @thakis again for the temporary fix.
Previously we errored out when disassembling illegal instructions and there would be no profile generated. In fact illegal instructions are not uncommon and we'd better skip them and print "unknown" instead of erroring out. This matches the behavior of llvm-objdump (see disassembleObject in llvm-objdump.cpp).
Reviewed By: wlei, wenlei
Differential Revision: https://reviews.llvm.org/D97776
Currently, getSourceFile accesses file system to check if two paths are
the same file with a thread lock, which is a huge performance bottleneck
in some cases. Currently, it's accessing file system size(files) * size(files) times.
Thus, cache file status information, which reduces file system access to size(files) times.
When I tested it with two binaries and 16 cpu cores,
it saved over 70% of time.
Binary 1: 56 secs -> 3 secs
Binary 2: 17 hours -> 4 hours
Differential Revision: https://reviews.llvm.org/D97061
fix attempt http://reviews.llvm.org/rGbbdb4c8c9bcef0e didn't work
The problem is that the test tries to look up
llvm_orc_registerJITLoaderGDBWrapper from the llvm-jitlink.exe
executable, but the symbol wasn't exported. Just manually export it
for now. There's a FIXME with a suggestion for a real fix.
lli aims to provide both, RuntimeDyld and JITLink, as the dynamic linkers/loaders for it's JIT implementations. And they both offer debugging via the GDB JIT interface, which builds on the two well-known symbol names `__jit_debug_descriptor` and `__jit_debug_register_code`. As these symbols must be unique accross the linked executable, we can only define them in one of the libraries and make the other depend on it. OrcTargetProcess is a minimal stub for embedding a JIT client in remote executors. For the moment it seems reasonable to have the definition there and let ExecutionEngine depend on it, until we find a better solution.
This is the second commit for the reviewed patch.
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D97339
Add a new ObjectLinkingLayer plugin `DebugObjectManagerPlugin` and infrastructure to handle creation of `DebugObject`s as well as their registration in OrcTargetProcess. The current implementation only covers ELF on x86-64, but the infrastructure is not limited to that.
The journey starts with a new `LinkGraph` / `JITLinkContext` pair being created for a `MaterializationResponsibility` in ORC's `ObjectLinkingLayer`. It sends a `notifyMaterializing()` notification, which is forwarded to all registered plugins. The `DebugObjectManagerPlugin` aims to create a `DebugObject` form the provided target triple and object buffer. (Future implementations might create `DebugObject`s from a `LinkGraph` in other ways.) On success it will track it as the pending `DebugObject` for the `MaterializationResponsibility`.
This patch only implements the `ELFDebugObject` for `x86-64` targets. It follows the RuntimeDyld approach for debug object setup: it captures a copy of the input object, parses all section headers and prepares to patch their load-address fields with their final addresses in target memory. It instructs the plugin to report the section load-addresses once they are available. The plugin overrides `modifyPassConfig()` and installs a JITLink post-allocation pass to capture them.
Once JITLink emitted the finalized executable, the plugin emits and registers the `DebugObject`. For emission it requests a new `JITLinkMemoryManager::Allocation` with a single read-only segment, copies the object with patched section load-addresses over to working memory and triggers finalization to target memory. For registration, it notifies the `DebugObjectRegistrar` provided in the constructor and stores the previously pending`DebugObject` as registered for the corresponding MaterializationResponsibility.
The `DebugObjectRegistrar` registers the `DebugObject` with the target process. `llvm-jitlink` uses the `TPCDebugObjectRegistrar`, which calls `llvm_orc_registerJITLoaderGDBWrapper()` in the target process via `TargetProcessControl` to emit a `jit_code_entry` compatible with the GDB JIT interface [1]. So far the implementation only supports registration and no removal. It appears to me that it wouldn't raise any new design questions, so I left this as an addition for the near future.
[1] https://sourceware.org/gdb/current/onlinedocs/gdb/JIT-Interface.html
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D97335
The argument value determines the dynamic linker to use (`default`, `rtdyld` or `jitlink`). The JITLink implementation only supports in-process JITing for now. This is the first commit for the reviewed patch.
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D97339
When lli runs the below IR, it emits in-memory debug objects and registers them with the GDB JIT interface. The tests dump and check the registered information. IR has limited ability to produce complex output in a portable way. Instead the tests rely on built-in functions implemented in lli. They use a new command line flag `-generate=function-name` to instruct the ORC JIT to expose the built-in function with the given name to the JITed program.
`debug-descriptor-elf-minimal.ll` calls `__dump_jit_debug_descriptor()` to reflect the list of debug entries issued for itself after emitting the main module. The output is textual and can be checked straight away.
`debug-objects-elf-minimal.ll` calls `__dump_jit_debug_objects()`, which instructs lli to walk through the list of debug entries and append the encountered in-memory objects to the program output. We feed this output into llvm-dwarfdump to parse the DWARF in each file and dump their structures.
We can do the same for JITLink once D97335 has landed.
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D97694
The situation with inline asm/MC error reporting is kind of messy at the
moment. The errors from MC layout are not reliably propagated and users
have to specify an inlineasm handler separately to get inlineasm
diagnose. The latter issue is not a correctness issue but could be improved.
* Kill LLVMContext inlineasm diagnose handler and migrate it to use
DiagnoseInfo/DiagnoseHandler.
* Introduce `DiagnoseInfoSrcMgr` to diagnose SourceMgr backed errors. This
covers use cases like inlineasm, MC, and any clients using SourceMgr.
* Move AsmPrinter::SrcMgrDiagInfo and its instance to MCContext. The next step
is to combine MCContext::SrcMgr and MCContext::InlineSrcMgr because in all
use cases, only one of them is used.
* If LLVMContext is available, let MCContext uses LLVMContext's diagnose
handler; if LLVMContext is not available, MCContext uses its own default
diagnose handler which just prints SMDiagnostic.
* Change a few clients(Clang, llc, lldb) to use the new way of reporting.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D97449
So far we had no way to distinguish between JITLink and RuntimeDyld in lli. Instead, we used implicit knowledge that RuntimeDyld would be used for linking ELF. In order to get D97337 to work with lli though, we have to move on and allow JITLink for ELF. This patch uses extensible RTTI to allow external clients to add their own layers without touching the LLVM sources.
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D97338
This makes the behavior similar to cp
```
chmod u+s,g+s,o+x a
sudo llvm-strip a -o b
// With this patch, b drops set-user-ID and set-group-ID bits.
// sudo cp a b => b does not have set-user-ID or set-group-ID bits.
```
This also changes the behavior for the following case:
```
chmod u+s,g+s,o+x a
llvm-strip a
// a preserves set-user-ID and set-group-ID bits.
// This matches binutils<2.36 and probably >=2.37. 2.36 and 2.36.1 have some compatibility issues.
```
Differential Revision: https://reviews.llvm.org/D97253
Under certain (currently unknown) conditions, llvm-profdata is outputting
profiles that have two consecutive entries in the MemOPSize section for the
value 0. This causes the PGOMemOPSizeOpt pass to output an invalid switch
instruction with two cases for 0. As mentioned, we’re not quite sure what’s
causing this to happen, but this patch prevents llvm-profdata from outputting a
profile that has this problem and gives an error with a request for a
reproducible.
Differential Revision: https://reviews.llvm.org/D92074
As discussed in D95511, this allows us to encode invalid BBAddrMap
sections to be used in more rigorous testing.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D96831
Small optimization of the code -- No need to calculate any stats
for NULL nodes, and also no need to call the collectStatsForDie()
if it is the CU itself.
Differential Revision: https://reviews.llvm.org/D96871
The presence or absence of an inline variable (as well as formal
parameter) with only an abstract_origin ref (without DW_AT_location)
should not change the location coverage.
It means, for both:
DW_TAG_inlined_subroutine
DW_AT_abstract_origin (0x0000004e "f")
DW_AT_low_pc (0x0000000000000010)
DW_AT_high_pc (0x0000000000000013)
DW_TAG_formal_parameter
DW_AT_abstract_origin (0x0000005a "b")
and,
DW_TAG_inlined_subroutine
DW_AT_abstract_origin (0x0000004e "f")
DW_AT_low_pc (0x0000000000000010)
DW_AT_high_pc (0x0000000000000013)
we should report 0% location coverage. If we add DW_AT_location,
for both cases the coverage should be improved.
Differential Revision: https://reviews.llvm.org/D96045
Some instructions defined in table-gen files sets usesCustomInserter
bit, which means it has to be lowered by target code and isn't actually
valid instruction at MC level. So we should treat them like pseudo
instructions.
Reviewed By: gchatelet
Differential Revision: https://reviews.llvm.org/D94898
As discussed on the RFC [0], I am sharing the set of patches that
enables checking of original Debug Info metadata preservation in
optimizations. The proof-of-concept/proposal can be found at [1].
The implementation from the [1] was full of duplicated code,
so this set of patches tries to merge this approach into the existing
debugify utility.
For example, the utility pass in the original-debuginfo-check
mode could be invoked as follows:
$ opt -verify-debuginfo-preserve -pass-to-test sample.ll
Since this is very initial stage of the implementation,
there is a space for improvements such as:
- Add support for the new pass manager
- Add support for metadata other than DILocations and DISubprograms
[0] https://groups.google.com/forum/#!msg/llvm-dev/QOyF-38YPlE/G213uiuwCAAJ
[1] https://github.com/djolertrk/llvm-di-checker
Differential Revision: https://reviews.llvm.org/D82545
The test that was failing is now forced to use the old PM.
As discussed on the RFC [0], I am sharing the set of patches that
enables checking of original Debug Info metadata preservation in
optimizations. The proof-of-concept/proposal can be found at [1].
The implementation from the [1] was full of duplicated code,
so this set of patches tries to merge this approach into the existing
debugify utility.
For example, the utility pass in the original-debuginfo-check
mode could be invoked as follows:
$ opt -verify-debuginfo-preserve -pass-to-test sample.ll
Since this is very initial stage of the implementation,
there is a space for improvements such as:
- Add support for the new pass manager
- Add support for metadata other than DILocations and DISubprograms
[0] https://groups.google.com/forum/#!msg/llvm-dev/QOyF-38YPlE/G213uiuwCAAJ
[1] https://github.com/djolertrk/llvm-di-checker
Differential Revision: https://reviews.llvm.org/D82545
GCC warning:
```
[3397/3703] Building CXX object tools/llvm-profgen/CMakeFiles/llvm-profgen.dir/llvm-profgen.cpp.o
In file included from /llvm-project/llvm/include/llvm/ADT/STLExtras.h:19,
from /llvm-project/llvm/include/llvm/ADT/StringRef.h:12,
from /llvm-project/llvm/include/llvm/ADT/Twine.h:13,
from /llvm-project/llvm/tools/llvm-profgen/ErrorHandling.h:12,
from /llvm-project/llvm/tools/llvm-profgen/llvm-profgen.cpp:13:
/llvm-project/llvm/include/llvm/ADT/Optional.h: In instantiation of ‘void llvm::optional_detail::OptionalStorage<T, <anonymous> >::emplace(Args&& ...) [with Args = {const std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, llvm::sampleprof::LineLocation>}; T = const std::pair<std::__cxx11::basic_string<char>, llvm::sampleprof::LineLocation>; bool <anonymous> = false]’:
/llvm-project/llvm/include/llvm/ADT/Optional.h:79:7: required from ‘constexpr llvm::optional_detail::OptionalStorage<T, <anonymous> >::OptionalStorage(llvm::optional_detail::OptionalStorage<T, <anonymous> >&&) [with T = const std::pair<std::__cxx11::basic_string<char>, llvm::sampleprof::LineLocation>; bool <anonymous> = false]’
/llvm-project/llvm/include/llvm/ADT/Optional.h:253:13: required from here
/llvm-project/llvm/include/llvm/ADT/Optional.h:113:12: warning: cast from type ‘const std::pair<std::__cxx11::basic_string<char>, llvm::sampleprof::LineLocation>*’ to type ‘void*’ casts away qualifiers [-Wcast-qual]
113 | ::new ((void *)std::addressof(value)) T(std::forward<Args>(args)...);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[3398/3703] Building CXX object tools/llvm-profgen/CMakeFiles/llvm-profgen.dir/PerfReader.cpp.o
In file included from /llvm-project/llvm/include/llvm/ADT/STLExtras.h:19,
from /llvm-project/llvm/include/llvm/ADT/StringRef.h:12,
from /llvm-project/llvm/include/llvm/ADT/Twine.h:13,
from /llvm-project/llvm/tools/llvm-profgen/ErrorHandling.h:12,
from /llvm-project/llvm/tools/llvm-profgen/PerfReader.h:11,
from /llvm-project/llvm/tools/llvm-profgen/PerfReader.cpp:8:
/llvm-project/llvm/include/llvm/ADT/Optional.h: In instantiation of ‘void llvm::optional_detail::OptionalStorage<T, <anonymous> >::emplace(Args&& ...) [with Args = {const std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, llvm::sampleprof::LineLocation>}; T = const std::pair<std::__cxx11::basic_string<char>, llvm::sampleprof::LineLocation>; bool <anonymous> = false]’:
/llvm-project/llvm/include/llvm/ADT/Optional.h:79:7: required from ‘constexpr llvm::optional_detail::OptionalStorage<T, <anonymous> >::OptionalStorage(llvm::optional_detail::OptionalStorage<T, <anonymous> >&&) [with T = const std::pair<std::__cxx11::basic_string<char>, llvm::sampleprof::LineLocation>; bool <anonymous> = false]’
/llvm-project/llvm/include/llvm/ADT/Optional.h:253:13: required from here
/llvm-project/llvm/include/llvm/ADT/Optional.h:113:12: warning: cast from type ‘const std::pair<std::__cxx11::basic_string<char>, llvm::sampleprof::LineLocation>*’ to type ‘void*’ casts away qualifiers [-Wcast-qual]
113 | ::new ((void *)std::addressof(value)) T(std::forward<Args>(args)...);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[3399/3703] Building CXX object tools/llvm-profgen/CMakeFiles/llvm-profgen.dir/ProfiledBinary.cpp.o
In file included from /llvm-project/llvm/include/llvm/ADT/STLExtras.h:19,
from /llvm-project/llvm/include/llvm/ADT/ArrayRef.h:15,
from /llvm-project/llvm/include/llvm/ADT/DenseMapInfo.h:18,
from /llvm-project/llvm/include/llvm/ADT/DenseMap.h:16,
from /llvm-project/llvm/include/llvm/ADT/DenseSet.h:16,
from /llvm-project/llvm/include/llvm/ProfileData/SampleProf.h:17,
from /llvm-project/llvm/tools/llvm-profgen/CallContext.h:12,
from /llvm-project/llvm/tools/llvm-profgen/ProfiledBinary.h:12,
from /llvm-project/llvm/tools/llvm-profgen/ProfiledBinary.cpp:9:
/llvm-project/llvm/include/llvm/ADT/Optional.h: In instantiation of ‘void llvm::optional_detail::OptionalStorage<T, <anonymous> >::emplace(Args&& ...) [with Args = {const std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, llvm::sampleprof::LineLocation>}; T = const std::pair<std::__cxx11::basic_string<char>, llvm::sampleprof::LineLocation>; bool <anonymous> = false]’:
/llvm-project/llvm/include/llvm/ADT/Optional.h:79:7: required from ‘constexpr llvm::optional_detail::OptionalStorage<T, <anonymous> >::OptionalStorage(llvm::optional_detail::OptionalStorage<T, <anonymous> >&&) [with T = const std::pair<std::__cxx11::basic_string<char>, llvm::sampleprof::LineLocation>; bool <anonymous> = false]’
/llvm-project/llvm/include/llvm/ADT/Optional.h:253:13: required from here
/llvm-project/llvm/include/llvm/ADT/Optional.h:113:12: warning: cast from type ‘const std::pair<std::__cxx11::basic_string<char>, llvm::sampleprof::LineLocation>*’ to type ‘void*’ casts away qualifiers [-Wcast-qual]
113 | ::new ((void *)std::addressof(value)) T(std::forward<Args>(args)...);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[3404/3703] Building CXX object tools/llvm-profgen/CMakeFiles/llvm-profgen.dir/ProfileGenerator.cpp.o
In file included from /llvm-project/llvm/include/llvm/ADT/STLExtras.h:19,
from /llvm-project/llvm/include/llvm/ADT/StringRef.h:12,
from /llvm-project/llvm/include/llvm/ADT/Twine.h:13,
from /llvm-project/llvm/tools/llvm-profgen/ErrorHandling.h:12,
from /llvm-project/llvm/tools/llvm-profgen/ProfileGenerator.h:11,
from /llvm-project/llvm/tools/llvm-profgen/ProfileGenerator.cpp:9:
/llvm-project/llvm/include/llvm/ADT/Optional.h: In instantiation of ‘void llvm::optional_detail::OptionalStorage<T, <anonymous> >::emplace(Args&& ...) [with Args = {const std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, llvm::sampleprof::LineLocation>}; T = const std::pair<std::__cxx11::basic_string<char>, llvm::sampleprof::LineLocation>; bool <anonymous> = false]’:
/llvm-project/llvm/include/llvm/ADT/Optional.h:79:7: required from ‘constexpr llvm::optional_detail::OptionalStorage<T, <anonymous> >::OptionalStorage(llvm::optional_detail::OptionalStorage<T, <anonymous> >&&) [with T = const std::pair<std::__cxx11::basic_string<char>, llvm::sampleprof::LineLocation>; bool <anonymous> = false]’
/llvm-project/llvm/include/llvm/ADT/Optional.h:253:13: required from here
/llvm-project/llvm/include/llvm/ADT/Optional.h:113:12: warning: cast from type ‘const std::pair<std::__cxx11::basic_string<char>, llvm::sampleprof::LineLocation>*’ to type ‘void*’ casts away qualifiers [-Wcast-qual]
113 | ::new ((void *)std::addressof(value)) T(std::forward<Args>(args)...);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```
As discussed in D95511, this allows us to encode invalid BBAddrMap
sections to be used in more rigorous testing.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D96831
lld already marks shared library defs as ExportDynamic, which prevents
potentially unsafe devirtualization of symbols defined in shared
libraries. Match that behavior in the gold plugin, and add the same
test.
Depends on D96721.
Differential Revision: https://reviews.llvm.org/D96722
1. Emit warnings for files without symbols.
2. Add -no_warning_for_no_symbols.
Test plan: make check-all
Differential revision: https://reviews.llvm.org/D95843
When the `DWOPath` is absolute, we want to use `DWOPath` as is, without prepending any other
components to the path. The `sys::path::append` does not join, but rather unconditionally appends
the paths, so something like `sys::path::append("/tmp", "/tmp/banana")` will result in
`/tmp/tmp/banana` rather than the desired `/tmp/banana`.
This then causes `llvm-dwp` to fail in a following situation:
```
$ clang -gsplit-dwarf /tmp/banana/test.c -c -o /tmp/outdir/foo.o
$ clang outdir/foo.o -o outdir/hm
$ llvm-dwarfdump outdir/hm | grep -C2 foo.dwo
DW_AT_comp_dir ("/tmp")
DW_AT_GNU_pubnames (true)
DW_AT_GNU_dwo_name ("/tmp/outdir/foo.dwo")
DW_AT_GNU_dwo_id (0xde4d396f3bf0e257)
DW_AT_low_pc (0x0000000000401100)
$ strace -o trace llvm-dwp -e outdir/hm -o outdir/hm.dwp
error: No such file or directory
$ cat trace | grep foo.dwo
openat(AT_FDCWD, "/tmp/tmp/outdir/foo.dwo", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
```
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D96678
The GPUDivergenceAnalysis is now renamed to just "DivergenceAnalysis"
since there is no conflict with LegacyDivergenceAnalysis. In the
legacy PM, this analysis can only be used through the legacy DA
serving as a wrapper. It is now made available as a pass in the new
PM, and has no relation with the legacy DA.
The new DA currently cannot handle irreducible control flow; its
presence can cause the analysis to run indefinitely. The analysis is
now modified to detect this and report all instructions in the
function as divergent. This is super conservative, but allows the
analysis to be used without hanging the compiler.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D96615
The few options are niche. They solved a problem which was traditionally solved
with more shell commands (`llvm-readelf -n` fetches the Build ID. Then
`ln` is used to hard link the file to a directory derived from the Build ID.)
Due to limitation, they are no longer used by Fuchsia and they don't appear to
be used elsewhere (checked with Google Search and Debian Code Search). So delete
them without a transition period.
Announcement: https://lists.llvm.org/pipermail/llvm-dev/2021-February/148446.html
Differential Revision: https://reviews.llvm.org/D96310
This adds colons to separate the file name from the message, removes a
duplicate space, and removes a trailing full stop from some messages.
These help bring the error messages into line with other tools, as well
as making all llvm-nm message more self-consistent.
Differential Revision: https://reviews.llvm.org/D96601
Reviewed by: Higuoxing, rupprecht, MaskRay
This version of the patch includes a fix for the cfi failures.
(undoes the revert commit 7db390cc7738a9ba0ed7d4ca59ab6ea2e69c47e9)
It also undoes reverts of follow-up patches that also needed reverting
originally:
* [LTO] Add option enable NewPM with LTOCodeGenerator.
(undoes revert commit 0a17664b47c153aa26a0d31b4835f26375440ec6)
* [LTOCodeGenerator] Use lto::Config for options (NFC)."
(undoes revert commit b0a8e41cfff717ff067bf63412d6edb0280608cd)
As of binutils 2.36, GNU strip calls chown(2) for "sudo strip foo" and
"sudo strip foo -o foo", but no "sudo strip foo -o bar" or "sudo strip
foo -o ./foo". In other words, while "sudo strip foo -o bar" creates a
new file bar with root access, "sudo strip foo" will keep the owner and
group of foo unchanged. Currently llvm-objcopy and llvm-strip behave
differently, always changing the owner and gropu to root. The
discrepancy prevents Chrome OS from migrating to llvm-objcopy and
llvm-strip as they change file ownership and cause intended users/groups
to lose access when invoked by sudo with the following sequence
(recommended in man page of GNU strip).
1.<Link the executable as normal.>
1.<Copy "foo" to "foo.full">
1.<Run "strip --strip-debug foo">
1.<Run "objcopy --add-gnu-debuglink=foo.full foo">
This patch makes llvm-objcopy and llvm-strip follow GNU's behavior.
Link: crbug.com/1108880
It appears some instructions doesn't have the debug location info and the symbolizer will return an empty call stack for them which will cause some crash later in profile unwinding. Actually we do not record the sample info for them, so this change just filter out those instruction.
As those instruction would appears at the begin and end of the instruction list, without them we need to add the boundary check for IP `advance` and `backward`.
Also for pseudo probe based profile, we actually don't need the symbolized location info, so here just change to use an empty stack for it. This could save half of the binary loading time.
Differential Revision: https://reviews.llvm.org/D96434
This include some changes related with PerfReader's the input check and command line change:
1) It appears there might be thousands of leading MMAP-Event line in the perfscript for large workload. For this case, the 4k threshold is not eligible to determine it's a hybrid sample. This change renovated the `isHybridPerfScript` by going through the script without threshold limitation checking whether there is a non-empty call stack immediately followed by a LBR sample. It will stop once it find a valid one.
2) Added several input validations for the command line switches in PerfReader.
3) Changed the command line `show-disassembly` to `show-disassembly-only`, it will print to stdout and exit early which leave an empty output profile.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D96387
This patch adds a pass to replace calls to vector intrinsics (i.e., LLVM
intrinsics operating on vector operands) with calls to a vector library.
Currently, calls to LLVM intrinsics are only replaced with calls to vector
libraries when scalar calls to intrinsics are vectorized by the Loop- or
SLP-Vectorizer.
With this pass, it is now possible to replace calls to LLVM intrinsics
already operating on vector operands, e.g., if such code was generated
by MLIR. For the replacement, information from the TargetLibraryInfo,
e.g., as specified via -vector-library is used.
This is a re-try of the original commit 2303e93e66 that was reverted
due to pass manager problems. Other minor changes have also been made.
Differential Revision: https://reviews.llvm.org/D95373
The gold LTO plugin uses a set of hooks to implements emit-llvm and capture intermediate file generated during LTO. The hooks are called by each lto backend thread with a taskID as argument to differentiate between threads and tasks. Currently, all threads are overwriting the same file which results into only the intermediate output of the last backend thread to be preserved. This diff encodes the taskID into the filename.
Reviewed By: tejohnson, wenlei
Differential Revision: https://reviews.llvm.org/D96173
Include x86 intrinsics only when compiling for x86_64
or i386. _MSC_VER no longer implies x86.
Reviewed By: gchatelet
Differential Revision: https://reviews.llvm.org/D96498
To align with https://reviews.llvm.org/D95547, we need to add brackets for context id before initializing the `SampleContext`.
Also added test cases for extended binary format from llvm-profgen side.
Differential Revision: https://reviews.llvm.org/D95929
This attempts to move all tools over to using `add_llvm_library` for
better consistency. After doing this, I noticed it ended up as nearly a
reimplementation of https://reviews.llvm.org/rL342148, which later got
reverted in r342336 (b09a8c9bd9b819741b38071a7ccd95042ef2643a).
With ccache and ninja on a large core machine (40), I haven't run into
build errors, so I'm hopeful it's better now, though it doesn't seem to
be any different / new.
Reviewed By: stephenneuendorffer
Differential Revision: https://reviews.llvm.org/D90970
It seems nicer to list passes given a flag rather than displaying all
passes in opt --help.
This is awkwardly structured because a PassBuilder is required, but
reusing the PassBuilder in runPassPipeline() doesn't work because we
read the input IR before getting to runPassPipeline(). So printing the
list of passes needs to happen before reading the input IR. If we remove
the legacy PM code in main() and move everything from NewPMDriver.cpp
into opt.cpp, we can create the PassBuilder before reading IR and check
if we should print the list of passes and exit. But until then this hack
seems fine.
Compared to the legacy PM, the new PM passes are lacking descriptions.
We'll need to figure out a way to add descriptions if we think this is
important.
Also, this only works for passes specified in PassRegistry.def. If we
want to print other custom registered passes, we'll need a different
mechanism.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D96101
parseSectionContents expects to skip regions not described by DWARF. With my
pending DebugInfo/Symbolize change, the filename can be recovered and there
will be more IndirectInstructions entries.
The current support only printed coredump notes, but most binaries also
contain notes. This change adds names for four FreeBSD-specific notes and
pretty-prints three of them:
NT_FREEBSD_ABI_TAG:
This note holds a 32-bit (decimal) integer containing the value of the
__FreeBSD_version macro, which is defined in crt1.o and will hold a value
such as 1300076 for a binary build on a FreeBSD 13 system.
NT_FREEBSD_ARCH_TAG:
A string containing the value of the build-time MACHINE_ARCH
NT_FREEBSD_FEATURE_CTL: A 32-bit flag that indicates to the kernel that
the binary wants certain bevahiour. Examples include setting
NT_FREEBSD_FCTL_ASLR_DISABLE which tells the kernel to disable ASLR.
After this change llvm-readobj also no longer decodes coredump-only
FreeBSD notes in non-coredump files. I've also converted the
note-freebsd.s test to use yaml2obj instead of llvm-mc.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D74393
Currently, if the note name is known, but the value isn't we don't print
the contents.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D74367
This attempts to move all tools over to using `add_llvm_library` for
better consistency. After doing this, I noticed it ended up as nearly a
reimplementation of https://reviews.llvm.org/rL342148, which later got
reverted in r342336 (b09a8c9bd9b819741b38071a7ccd95042ef2643a).
With ccache and ninja on a large core machine (40), I haven't run into
build errors, so I'm hopeful it's better now, though it doesn't seem to
be any different / new.
Reviewed By: stephenneuendorffer
Differential Revision: https://reviews.llvm.org/D90970
This patch adds a pass to replace calls to vector intrinsics
(i.e., LLVM intrinsics operating on vector operands) with
calls to a vector library.
Currently, calls to LLVM intrinsics are only replaced with
calls to vector libraries when scalar calls to intrinsics are
vectorized by the Loop- or SLP-Vectorizer.
With this pass, it is now possible to replace calls to LLVM
intrinsics already operating on vector operands, e.g., if
such code was generated by MLIR. For the replacement,
information from the TargetLibraryInfo, e.g., as specified
via -vector-library is used.
Differential Revision: https://reviews.llvm.org/D95373
As mentioned in TODO comment, casting double to float causes NaNs to change bits.
To avoid the change, this patch adds support for single-floating-point immediate value on MachineCode.
Patch by Yuta Saito.
Differential Revision: https://reviews.llvm.org/D77384
when we skip the call stack starting with an external address, we should also skip the bottom LBR entry, otherwise it will cause a truncated context issue.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D95480
This change allows merging and trimming cold context profile in llvm-profgen to solve profile size bloat problem. Currently when the profile's total sample is below threshold(supported by a switch), it will be considered cold and merged into a base context-less profile, which will at least keep the profile quality as good as the baseline(non-cs).
For example, two input profiles:
[main @ foo @ bar]:60
[main @ bar]:50
Under threshold = 100, the two profiles will be merge into one with the base context, get result:
[bar]:110
Added two switches:
`--csprof-cold-thres=<value>`: Specified the total samples threshold for a context profile to be considered cold, with 100 being the default. Any cold context profiles will be merged into context-less base profile by default.
`--csprof-keep-cold`: Force profile generation to keep cold context profiles instead of dropping them. By default, any cold context will not be written to output profile.
Results:
Though not yet evaluating it with the latest CSSPGO, our internal branch shows neutral on performance but significantly reduce the profile size. Detailed evaluation on llvm-profgen with CSSPGO will come later.
Differential Revision: https://reviews.llvm.org/D94111
Warnings have been added for three cases (PR41905): (1) missing debug info, (2)
the source file cannot be found, (3) the debug info points at a line beyond the
end of the file.
(1) is probably less useful. This was brought up once on
http://lists.llvm.org/pipermail/llvm-dev/2020-April/141264.html and two
internal users mentioned it to me that it was annoying. (I personally
find the warning confusing, too.)
Users specify --source to get additional information if sources happen to be
available. If sources are not available, it should be obvious as the output
will have no interleaved source lines. The warning can be especially annoying
when using llvm-objdump -S on a bunch of files.
This patch drops the warning when there is no debug info.
(If LLVMSymbolizer::symbolizeCode returns an `Error`, there will still be
an error. There is currently no test for an `Error` return value.
The only code path is probably a broken symbol table, but we probably already emit a warning
in that case)
`source-interleave-prefix.test` has an inappropriate "malformed" test - the test simply has no
.debug_* because new llc does not produce debug info when the filename is empty (invalid).
I have tried tampering the header of .debug_info/.debug_line but llvm-symbolizer does not warn.
This patch does not intend to add the missing test coverage.
Differential Revision: https://reviews.llvm.org/D88715
For CS profile generation, the process of call stack unwinding is time-consuming since for each LBR entry we need linear time to generate the context( hash, compression, string concatenation). This change speeds up this by grouping all the call frame within one LBR sample into a trie and aggregating the result(sample counter) on it, deferring the context compression and string generation to the end of unwinding.
Specifically, it uses `StackLeaf` as the top frame on the stack and manipulates(pop or push a trie node) it dynamically during virtual unwinding so that the raw sample can just be recoded on the leaf node, the path(root to leaf) will represent its calling context. In the end, it traverses the trie and generates the context on the fly.
Results:
Our internal branch shows about 5X speed-up on some large workloads in SPEC06 benchmark.
Differential Revision: https://reviews.llvm.org/D94110
This change compresses the context string by removing cycles due to recursive function for CS profile generation. Removing recursion cycles is a way to normalize the calling context which will be better for the sample aggregation and also make the context promoting deterministic.
Specifically for implementation, we recognize adjacent repeated frames as cycles and deduplicated them through multiple round of iteration.
For example:
Considering a input context string stack:
[“a”, “a”, “b”, “c”, “a”, “b”, “c”, “b”, “c”, “d”]
For first iteration,, it removed all adjacent repeated frames of size 1:
[“a”, “b”, “c”, “a”, “b”, “c”, “b”, “c”, “d”]
For second iteration, it removed all adjacent repeated frames of size 2:
[“a”, “b”, “c”, “a”, “b”, “c”, “d”]
So in the end, we get compressed output:
[“a”, “b”, “c”, “d”]
Compression will be called in two place: one for sample's context key right after unwinding, one is for the eventual context string id in the ProfileGenerator.
Added a switch `compress-recursion` to control the size of duplicated frames, default -1 means no size limit.
Added unit tests and regression test for this.
Differential Revision: https://reviews.llvm.org/D93556
For CS profile generation, the process of call stack unwinding is time-consuming since for each LBR entry we need linear time to generate the context( hash, compression, string concatenation). This change speeds up this by grouping all the call frame within one LBR sample into a trie and aggregating the result(sample counter) on it, deferring the context compression and string generation to the end of unwinding.
Specifically, it uses `StackLeaf` as the top frame on the stack and manipulates(pop or push a trie node) it dynamically during virtual unwinding so that the raw sample can just be recoded on the leaf node, the path(root to leaf) will represent its calling context. In the end, it traverses the trie and generates the context on the fly.
Results:
Our internal branch shows about 5X speed-up on some large workloads in SPEC06 benchmark.
Differential Revision: https://reviews.llvm.org/D94110
This change compresses the context string by removing cycles due to recursive function for CS profile generation. Removing recursion cycles is a way to normalize the calling context which will be better for the sample aggregation and also make the context promoting deterministic.
Specifically for implementation, we recognize adjacent repeated frames as cycles and deduplicated them through multiple round of iteration.
For example:
Considering a input context string stack:
[“a”, “a”, “b”, “c”, “a”, “b”, “c”, “b”, “c”, “d”]
For first iteration,, it removed all adjacent repeated frames of size 1:
[“a”, “b”, “c”, “a”, “b”, “c”, “b”, “c”, “d”]
For second iteration, it removed all adjacent repeated frames of size 2:
[“a”, “b”, “c”, “a”, “b”, “c”, “d”]
So in the end, we get compressed output:
[“a”, “b”, “c”, “d”]
Compression will be called in two place: one for sample's context key right after unwinding, one is for the eventual context string id in the ProfileGenerator.
Added a switch `compress-recursion` to control the size of duplicated frames, default -1 means no size limit.
Added unit tests and regression test for this.
Differential Revision: https://reviews.llvm.org/D93556
This change implements profile generation infra for pseudo probe in llvm-profgen. During virtual unwinding, the raw profile is extracted into range counter and branch counter and aggregated to sample counter map indexed by the call stack context. This change introduces the last step and produces the eventual profile. Specifically, the body of function sample is recorded by going through each probe among the range and callsite target sample is recorded by extracting the callsite probe from branch's source.
Please refer https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s and https://reviews.llvm.org/D89707 for more context about CSSPGO and llvm-profgen.
**Implementation**
- Extended `PseudoProbeProfileGenerator` for pseudo probe based profile generation.
- `populateBodySamplesWithProbes` reading range counter is responsible for recording function body samples and inferring caller's body samples.
- `populateBoundarySamplesWithProbes` reading branch counter is responsible for recording call site target samples.
- Each sample is recorded with its calling context(named `ContextId`). Remind that the probe based context key doesn't include the leaf frame probe info, so the `ContextId` string is created from two part: one from the probe stack strings' concatenation and other one from the leaf frame probe.
- Added regression test
Test Plan:
ninja & ninja check-llvm
Differential Revision: https://reviews.llvm.org/D92998
In binutils, the flag is defined for ELFOSABI_GNU and ELFOSABI_FREEBSD.
It can be used to mark a section as a GC root.
In practice, the flag has generic semantics and can be applied to many
EI_OSABI values, so we consider it generic.
Differential Revision: https://reviews.llvm.org/D95728
This patch let the yaml encoding use Hex64 values for NumBlocks, BB AddressOffset, BB Size, and BB Metadata.
Additionally, it changes the decoded values in elf2yaml to uint64_t to match DataExtractor::getULEB128 return type.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D95767
This is consistent with BFD objcopy.
Previously llvm objcopy would allocate space for SHT_NOBITS sections
often resulting in enormous binary files.
New test case (binary-paddr.test %t6).
Reviewed By: jhenderson, MaskRay
Differential Revision: https://reviews.llvm.org/D95569
Current dsymutil implementation of hasLiveMemoryLocation()/hasLiveAddressRange()
and applyValidRelocs() assume that calls should be done in certain order
(from first Dies to last). Multi-thread implementation might call these methods
in other order(it might process compilation units in order other than they are physically
located), so we remove restriction that searching for relocations should be done
in ascending order. This change does not introduce noticable performance degradation.
The testing results for clang binary:
golden-dsymutil/dsymutil 23787992
clang MD5: 5efa8fd9355ebf81b65f24db5375caa2
elapsed time=91sec
build-Release/bin/dsymutil 23855616
clang MD5: 5efa8fd9355ebf81b65f24db5375caa2
elapsed time=91sec
Differential Revision: https://reviews.llvm.org/D93106
Fixes https://bugs.llvm.org/show_bug.cgi?id=48882.
If the input file does not exist (or has a reading error), the
following code will crash if there are two or more input addresses.
```
auto ResOrErr = Symbolizer.symbolizeInlinedCode(
ModuleName, {Offset, object::SectionedAddress::UndefSection});
Printer << (error(ResOrErr) ? DILineInfo() : ResOrErr.get().getFrame(0));
```
For the first address, `symbolizeInlinedCode` returns an error.
For the second address, `symbolizeInlinedCode` returns an empty result
(not an error) and `.getFrame(0)` will crash.
Differential revision: https://reviews.llvm.org/D95609
This patch adds an option to enable the new pass manager in
LTOCodeGenerator. It also updates a few tests with legacy PM specific
tests, which started failing after 6a59f0560648 when
LLVM_ENABLE_NEW_PASS_MANAGER=true.
This patch updates LTOCodeGenerator to use the utilities provided by
LTOBackend to run middle-end optimizations and backend code generation.
This is a first step towards unifying the code used by libLTO's C API
and the newer, C++ interface (see PR41541).
The immediate motivation is to allow using the new pass manager when
doing LTO using libLTO's C API, which is used on Darwin, among others.
With the changes, there are no codegen/stats differences when building
MultiSource/SPEC2000/SPEC2006 on Darwin X86 with LTO, compared
to without the patch.
Reviewed By: steven_wu
Differential Revision: https://reviews.llvm.org/D94487
Compact unwind entries have 8 bits for the encoding-table offset:
* offsets 0..126 reference the global commmon-encodings table, while
* offsets 127..255 reference a per-second-level-page table.
This diff teaches `llvm-objdump` to print this per-page encodings table.
Differential Revision: https://reviews.llvm.org/D93265
splitCodeGen does not need to take ownership of the module, as it
currently clones the original module for each split operation.
There is an ~4 year old fixme to change that, but until this is
addressed, the function can just take a reference to the module.
This makes the transition of LTOCodeGenerator to use LTOBackend a bit
easier, because under some circumstances, LTOCodeGenerator needs to
write the original module back after codegen.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D95222
This patch replaces use of deprecated gethostbyname by getaddrinfo.
Author: Rafik Zurob
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D95477
Fixes https://bugs.llvm.org/show_bug.cgi?id=43543
Currently we report "The file was not recognized as a valid object file" for BC files.
Also, we terminate dumping.
Instead we could report a better warning and try to continue dumping other files.
This is what this patch implements.
Differential revision: https://reviews.llvm.org/D95605
This change brings up support of context-sensitive profiles in the format of extended binary. Existing sample profile reader/writer/merger code is being tweaked to reflect the fact of bracketed input contexts, like (`[...]`). The paired brackets are also needed in extbinary profiles because we don't yet have an otherwise good way to tell calling contexts apart from regular function names since the context delimiter `@` can somehow serve as a part of the C++ mangled names.
Reviewed By: wmi, wenlei
Differential Revision: https://reviews.llvm.org/D95547
Identify dynamically exported symbols (--export-dynamic[-symbol=],
--dynamic-list=, or definitions needed to preempt shared objects) and
prevent their LTO visibility from being upgraded.
This helps avoid use of whole program devirtualization when there may
be overrides in dynamic libraries.
Differential Revision: https://reviews.llvm.org/D91583
FaultsMapParser lived in CodeGen and was forcing llvm-objdump to
link CodeGen and everything CodeGen depends on.
This was previously attempted in r240364 to fix a link failure.
The CodeGen dependency was independently added to fix the same
link failure, and that ended up being kept.
Removing the dependency seems like the correct layering for
llvm-objdump.
Reviewed By: MaskRay, jhenderson
Differential Revision: https://reviews.llvm.org/D95414
There are two use cases.
Assembler
We have accrued some code gated on MCAsmInfo::useIntegratedAssembler(). Some
features are supported by latest GNU as, but we have to use
MCAsmInfo::useIntegratedAs() because the newer versions have not been widely
adopted (e.g. SHF_LINK_ORDER 'o' and 'unique' linkage in 2.35, --compress-debug-sections= in 2.26).
Linker
We want to use features supported only by LLD or very new GNU ld, or don't want
to work around older GNU ld. We currently can't represent that "we don't care
about old GNU ld". You can find such workarounds in a few other places, e.g.
Mips/MipsAsmprinter.cpp PowerPC/PPCTOCRegDeps.cpp X86/X86MCInstrLower.cpp
AArch64 TLS workaround for R_AARCH64_TLSLD_MOVW_DTPREL_* (PR ld/18276),
R_AARCH64_TLSLE_LDST8_TPREL_LO12 (https://bugs.llvm.org/show_bug.cgi?id=36727https://sourceware.org/bugzilla/show_bug.cgi?id=22969)
Mixed SHF_LINK_ORDER and non-SHF_LINK_ORDER components (supported by LLD in D84001;
GNU ld feature request https://sourceware.org/bugzilla/show_bug.cgi?id=16833 may take a while before available).
This feature allows to garbage collect some unused sections (e.g. fragmented .gcc_except_table).
This patch adds `-fbinutils-version=` to clang and `-binutils-version` to llc.
It changes one codegen place in SHF_MERGE to demonstrate its usage.
`-fbinutils-version=2.35` means the produced object file does not care about GNU
ld<2.35 compatibility. When `-fno-integrated-as` is specified, the produced
assembly can be consumed by GNU as>=2.35, but older versions may not work.
`-fbinutils-version=none` means that we can use all ELF features, regardless of
GNU as/ld support.
Both clang and llc need `parseBinutilsVersion`. Such command line parsing is
usually implemented in `llvm/lib/CodeGen/CommandFlags.cpp` (LLVMCodeGen),
however, ClangCodeGen does not depend on LLVMCodeGen. So I add
`parseBinutilsVersion` to `llvm/lib/Target/TargetMachine.cpp` (LLVMTarget).
Differential Revision: https://reviews.llvm.org/D85474
A default version (@@) is only available for defined symbols.
Currently we use "@@" for undefined symbols too.
This patch fixes the issue and improves our test case.
Differential revision: https://reviews.llvm.org/D95219
The llvm-dwp tool hard-codes the target triple to x86. Instead, deduce the
target triple from the object files being read.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D93749
This was discussed in D93678 thread.
Currently we have one special chunk - Fill.
This patch re implements the "SectionHeaderTable" key to become a special chunk too.
With that we are able to place the section header table at any location,
just like we place sections.
Differential revision: https://reviews.llvm.org/D95140
To simplify the transition to using LTOBackend, move DisableVerify to
the LTOCodeGenerator class, like most/all other options.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D95223
This patch adds a new InstModificationIRStrategy to mutate flags/options
for instructions. For example, it may add or remove nuw/nsw flags from
add, mul, sub, shl instructions or change the predicate for icmp
instructions.
Subtle changes such as those mentioned above should lead to a more
interesting range of inputs. The presence or absence of overflow flags
can expose subtle bugs, for example.
Reviewed By: bogner
Differential Revision: https://reviews.llvm.org/D94905
The target features are obtained as a list of features/attributes.
Instead of storing them in a single string, store the vector. This
matches lto::Config's behavior and simplifies the transition to
lto::backend().
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D95224
lto::Config has a field to control whether the build is "freestanding"
(no builtins) or not, but it is not hooked up to the code actually
running the passes.
This patch adds support for the flag to both the code that runs
optimization with the new and old pass managers, by explicitly adding a
TargetLibraryInfo instance. If Freestanding is true, all library functions
are disabled.
Reviewed By: steven_wu
Differential Revision: https://reviews.llvm.org/D94630
We tend to assume that the AA pipeline is by default the default AA
pipeline and it's confusing when it's empty instead.
PR48779
Initially reverted due to BasicAA running analyses in an unspecified
order (multiple function calls as parameters), fixed by fetching
analyses before the call to construct BasicAA.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D95117
We tend to assume that the AA pipeline is by default the default AA
pipeline and it's confusing when it's empty instead.
PR48779
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D95117
This makes the following improvements.
For `SHT_GNU_versym`:
* yaml2obj: set `sh_link` to index of `.dynsym` section automatically.
For `SHT_GNU_verdef`:
* yaml2obj: set `sh_link` to index of `.dynstr` section automatically.
* yaml2obj: set `sh_info` field automatically.
* obj2yaml: don't dump the `Info` field when its value matches the number of version definitions.
For `SHT_GNU_verneed`:
* yaml2obj: set `sh_link` to index of `.dynstr` section automatically.
* yaml2obj: set `sh_info` field automatically.
* obj2yaml: don't dump the `Info` field when its value matches the number of version dependencies.
Also, simplifies few test cases.
Differential revision: https://reviews.llvm.org/D94956
The modification time in the debug map is expressed using second
precision, while the modification time returned by the filesystem could
be more precise. Avoid spurious warnings about timestamp mismatches by
truncating the modification time reported by the system to seconds.
Linking large bitcode archives currently takes a lot of time with llvm-link,
this patch adds couple improvements which reduce link time for archives
- Use one Linker instance for archive instead of recreating it for each member
- Lazy load archive members
Reviewed By: tra, jdoerfert
Differential Revision: https://reviews.llvm.org/D94643
In preparation for turning on opt's -enable-new-pm by default, this pins
uses of passes via the legacy "opt -passname" with pass names beginning
with "polly-" and "polyhedral-info" to the legacy PM. Many of these
tests use -analyze, which isn't supported in the new PM.
(This doesn't affect uses of "opt -passes=passname").
rL240766 accidentally removed `-polly-prepare` in
phi_not_grouped_at_top.ll, and it also doesn't use the output of
-analyze.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D94266