This patch introduces a DIPrinter interface to implement by different output style printer implementations. DIPrinterGNU and DIPrinterLLVM implement the GNU and LLVM output style printing respectively. No functional changes.
This refactoring clarifies and simplifies the code, and makes a new output style addition easier.
Reviewed By: jhenderson, dblaikie
Differential Revision: https://reviews.llvm.org/D98994
In future patches I will be setting the IsText parameter frequently so I will refactor the args to be in the following order. I have removed the FileSize parameter because it is never used.
```
static ErrorOr<std::unique_ptr<MemoryBuffer>>
getFile(const Twine &Filename, bool IsText = false,
bool RequiresNullTerminator = true, bool IsVolatile = false);
static ErrorOr<std::unique_ptr<MemoryBuffer>>
getFileOrSTDIN(const Twine &Filename, bool IsText = false,
bool RequiresNullTerminator = true);
static ErrorOr<std::unique_ptr<MB>>
getFileAux(const Twine &Filename, uint64_t MapSize, uint64_t Offset,
bool IsText, bool RequiresNullTerminator, bool IsVolatile);
static ErrorOr<std::unique_ptr<WritableMemoryBuffer>>
getFile(const Twine &Filename, bool IsVolatile = false);
```
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D99182
Maybe there's a way to make them work, but until I've investigated
if tools can consume large PDBs, erroring out is better than slowly
and silently consuming all available ram due to internal invariants
being violated.
(Patch to make writing larger files work at
https://bugs.chromium.org/p/chromium/issues/detail?id=1179085#c25
but I haven't had time to check if windbg & co can consume these
large PDBs. llvm-pdbutil can't, but we can fix that one at least :) )
Differential Revision: https://reviews.llvm.org/D98788
The LINK_COMPONENTS dependency between DebugInfoCodeView and
DebugInfoMSF is unnecessary. Breaking them would allow a more
fine-controlled distribution.
Patch By: dangyi
Differential Revision: https://reviews.llvm.org/D98465
This reverts commit bacf9cf2c5cdec3567580e5030c4c82f42b3d745 and
reinstates commit 1a9bd5b81328adf0dd5a8b4f3ad5949463e66da3.
Reverting this commit did not appear to make the problem go away, so we
can go ahead and reland it.
In some cases a broken or invalid debug info could cause a crash in DWARFUnit::getInlinedChainForAddress during parsing a chain of in-lined functions. This patch fixes this issue.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D98119
`AttributeSpec` does not contain values while `DWARFAttribute` already
does. Therefore one no longer needs to pass `uint64_t *OffsetPtr`.
Differential Revision: https://reviews.llvm.org/D98194
D81469 introduced a check to error on CIE version different
than 1 for eh_frame, but older compilers mistakenly create binaries
with this version set to 3 for DWARF4 or 4 to DWARF5. Move the check
to dump time instead of eh_frame parse time, so we can be tolerant
with older binaries.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D97830
Their names don't convey much information, so they should be excluded.
The behavior matches addr2line.
Differential Revision: https://reviews.llvm.org/D96617
This patch intended to provide additional interface to LLVMsymbolizer
such that they work directly on object files. There is an existing
method - symbolizecode which takes an object file, this patch provides
similar overloads for symbolizeInlinedCode, symbolizeData,
symbolizeFrame. This can be useful for clients who already have a
in-memory object files to symbolize for.
Patch By: pvellien (praveen velliengiri)
Reviewed By: scott.linder
Differential Revision: https://reviews.llvm.org/D95232
This fixes coff-dwarf.test on some build bots.
The test relies on the sort order and prefers main (StorageClass: External) to .text (StorageClass: Static).
Before d08bd13ac8a560c4645e17e192ca07e1bdcd2895, only `SymbolRef::ST_Function`
symbols were used for .symtab symbolization. That commit added a `"DATA"` mode
to llvm-symbolizer which used `SymbolRef::ST_Data` symbols for symbolization.
Since function and data symbols have different addresses, we don't need to
differentiate the two modes. This patches unifies the two modes to simplify
code.
`"DATA"` is used by `compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_libcdep.cpp`.
`check-hwasan` and `check-tsan` have runtime tests.
Differential Revision: https://reviews.llvm.org/D96322
The ELF spec says:
> STT_FILE: Conventionally, the symbol's name gives the name of the source file associated with the object file. A file symbol has STB_LOCAL binding, its section index is SHN_ABS, and it precedes the other STB_LOCAL symbols for the file, if it is present.
For a local symbol, the preceding STT_FILE symbol is almost always in the same
file[1]. GNU addr2line uses this heuristic to retrieve the filename associated
with a local symbol (e.g. internal linkage functions in C/C++).
GNU addr2line can assign STT_FILE filename to a non-local symbol, too, but the trick
only works if no regular symbol precede STT_FILE. This patch does not implement this corner case
(not useful for most executables which have more than one files).
In case of filename mismatch between .debug_line & .symtab, arbitrarily make .debug_line win.
[1]: LLD does not synthesize STT_FILE symbols
(https://bugs.llvm.org/show_bug.cgi?id=48023 see also
https://sourceware.org/bugzilla/show_bug.cgi?id=26822). An assembly file
without `.file` directives can cause mis-attribution. This is an edge case.
Differential Revision: https://reviews.llvm.org/D95927
In assembly files, omitting `.type foo,@function` is common. Such functions have
type `STT_NOTYPE` and llvm-symbolizer reports `??` for them.
An ifunc symbol usually has an associated resolver symbol which is defined at
the same address. Returning either one is fine for symbolization. The resolver
symbol may not end up in the symbol table if (object file) `.L` is used (linked
image) .symtab is stripped while .dynsym is retained.
This patch allows ELF STT_NOTYPE/STT_GNU_IFUNC symbols for .symtab symbolization.
I have left TODO in the test files for an unimplemented STT_FILE heuristic.
Differential Revision: https://reviews.llvm.org/D95916
GCC warning:
```
/llvm-project/llvm/lib/DebugInfo/DWARF/DWARFDebugFrame.cpp: In member function ‘llvm::Expected<long unsigned int> llvm::dwarf::CFIProgram::Instruction::getOperandAsUnsigned(const llvm::dwarf::CFIProgram&, uint32_t) const’:
/llvm-project/llvm/lib/DebugInfo/DWARF/DWARFDebugFrame.cpp:425:1: warning: control reaches end of non-void function [-Wreturn-type]
425 | }
| ^
/llvm-project/llvm/lib/DebugInfo/DWARF/DWARFDebugFrame.cpp: In member function ‘llvm::Expected<long int> llvm::dwarf::CFIProgram::Instruction::getOperandAsSigned(const llvm::dwarf::CFIProgram&, uint32_t) const’:
/llvm-project/llvm/lib/DebugInfo/DWARF/DWARFDebugFrame.cpp:477:1: warning: control reaches end of non-void function [-Wreturn-type]
477 | }
| ^
```
This patch adds the ability to evaluate the state machine for CIE and FDE unwind objects and produce a UnwindTable with all UnwindRow objects needed to unwind registers. It will also dump the UnwindTable for each CIE and FDE when dumping DWARF .debug_frame or .eh_frame sections in llvm-dwarfdump or llvm-objdump. This allows users to see what the unwind rows actually look like for a given CIE or FDE instead of just seeing a list of opcodes.
This patch adds new classes: UnwindLocation, RegisterLocations, UnwindRow, and UnwindTable.
UnwindLocation is a class that describes how to unwind a register or Call Frame Address (CFA).
RegisterLocations is a class that tracks registers and their UnwindLocations. It gets populated when parsing the DWARF call frame instruction opcodes for a unwind row. The registers are mapped from their register numbers to the UnwindLocation in a map.
UnwindRow contains the result of evaluating a row of DWARF call frame instructions for the CIE, or a row from a FDE. The CIE can produce a set of initial instructions that each FDE that points to that CIE will use as the seed for the state machine when parsing FDE opcodes. A UnwindRow for a CIE will not have a valid address, whille a UnwindRow for a FDE will have a valid address.
The UnwindTable is a class that contains a sorted (by address) vector of UnwindRow objects and is the result of parsing all opcodes in a CIE, or FDE. Parsing a CIE should produce a UnwindTable with a single row. Parsing a FDE will produce a UnwindTable with one or more UnwindRow objects where all UnwindRow objects have valid addresses. The rows in the UnwindTable will be sorted from lowest Address to highest after parsing the state machine, or an error will be returned if the table isn't sorted. To parse a UnwindTable clients can use the following methods:
static Expected<UnwindTable> UnwindTable::create(const CIE *Cie);
static Expected<UnwindTable> UnwindTable::create(const FDE *Fde);
A valid table will be returned if the DWARF call frame instruction opcodes have no encoding errors. There are a few things that can go wrong during the evaluation of the state machine and these create functions will catch and return them.
Differential Revision: https://reviews.llvm.org/D89845
This reverts commit 5b7aef6eb4b2930971029b984cb2360f7682e5a5 and relands
6529d7c5a45b1b9588e512013b02f891d71bc134.
The ASan error was debugged and determined to be the fault of an invalid
object file input in our test suite, which was fixed by my last change.
LLD's project policy is that it assumes input objects are valid, so I
have added a comment about this assumption to the relocation bounds
check.
This is a pretty classic optimization. Instead of processing symbol
records and copying them to temporary storage, do a first pass to
measure how large the module symbol stream will be, and then copy the
data into place in the PDB file. This requires defering relocation until
much later, which accounts for most of the complexity in this patch.
This patch avoids copying the contents of all live .debug$S sections
into heap memory, which is worth about 20% of private memory usage when
making PDBs. However, this is not an unmitigated performance win,
because it can be faster to read dense, temporary, heap data than it is
to iterate symbol records in object file backed memory a second time.
Results on release chrome.dll:
peak mem: 5164.89MB -> 4072.19MB (-1,092.7MB, -21.2%)
wall-j1: 0m30.844s -> 0m32.094s (slightly slower)
wall-j3: 0m20.968s -> 0m20.312s (slightly faster)
wall-j8: 0m19.062s -> 0m17.672s (meaningfully faster)
I gathered similar numbers for a debug, component build of content.dll
in Chrome, and the performance impact of this change was in the noise.
The memory usage reduction was visible and similar.
Because of the new parallelism in the PDB commit phase, more cores makes
the new approach faster. I'm assuming that most C++ developer machines
these days are at least quad core, so I think this is a win.
Differential Revision: https://reviews.llvm.org/D94267
Fixes issue where if a line section doesn't start with a line number
then the addresses at the beginning of the section don't have line numbers.
For example, for a line section like this
```
0001:00000010-00000014, line/column/addr entries = 1
7 00000013 !
```
a line number wouldn't be found for addresses from 10 to 12.
This matches behavior when using the DIA SDK.
Differential Revision: https://reviews.llvm.org/D93306
The existing code handles this correctly and I checked that the code
in NativeInlineSiteSymbol also handles this correctly, but it was
wrong in the NativeFunctionSymbol code.
Differential Revision: https://reviews.llvm.org/D92134
llvm-symbolizer used to use the DIA SDK for symbolization on
Windows; this patch switches to using native symbolization, which was
implemented recently.
Users can still make the symbolizer use DIA by adding the `-dia` flag
in the LLVM_SYMBOLIZER_OPTS environment variable.
Differential Revision: https://reviews.llvm.org/D91814
This allows to reuse the RelocationResolver from the code
that doesn't want to deal with `RelocationRef` class.
I am going to use it in llvm-readobj. See the description
of D91530 for more details.
Differential revision: https://reviews.llvm.org/D91533
In the current state, if getFromHash(0) is called and there's no CU with
dwo_id=0, the lookup will stop at an empty slot, then the check
`Rows[H].getSignature() != S` won't cause the lookup to fail and return
a nullptr (as it should), because the empty slot has a 0 in the
signature field, and a pointer to the empty slot will be incorrectly
returned.
This patch fixes this by using the index field in the hash entry to
check for empty slots: signature = 0 can match a valid hash but
according to the spec the index for an occupied slot will always be
non-zero.
Differential Revision: https://reviews.llvm.org/D91670
No longer rely on an external tool to build the llvm component layout.
Instead, leverage the existing `add_llvm_componentlibrary` cmake function and
introduce `add_llvm_component_group` to accurately describe component behavior.
These function store extra properties in the created targets. These properties
are processed once all components are defined to resolve library dependencies
and produce the header expected by llvm-config.
Differential Revision: https://reviews.llvm.org/D90848