Symbol tables can have symbols with no size in mach-o files that were failing to get combined into a single entry. This resulted in many duplicate entries for the same address and made gsym files larger.
Differential Revision: https://reviews.llvm.org/D105068
The Length, AbbrOffset and Values fields of the debug_info section are
optional. This patch helps remove them and simplify test cases.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D86857
This patch adds support for emitting multiple abbrev tables. Currently,
compilation units will always reference the first abbrev table.
Reviewed By: jhenderson, labath
Differential Revision: https://reviews.llvm.org/D86194
'InitialLength' is replaced with 'Format' (DWARF32 by default) and 'Length' in this patch.
Besides, test cases for DWARFv4 and DWARFv5, DWARF32 and DWARF64 is
added.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D82622
Summary:
The Offset provides the offset within the function in a SourceLocation struct. This allows us to show the byte offset within a function. We also track offsets within inline functions as well. Updated the lookup tests to verify the offset for functions and inline functions.
0x1000: main + 32 @ /tmp/main.cpp:45
Reviewers: labath, aadsm, serhiy.redko, jankratochvil, xiaobai, wallace, aprantl, JDevlieghere
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74680
Summary:
The DWARF transformer is added as a class so it can be unit tested fully.
The DWARF is converted to GSYM format and handles many special cases for functions:
- omit functions in compile units with 4 byte addresses whose address is UINT32_MAX (dead stripped)
- omit functions in compile units with 8 byte addresses whose address is UINT64_MAX (dead stripped)
- omit any functions whose high PC is <= low PC (dead stripped)
- StringTable builder doesn't copy strings, so we need to make backing copies of strings but only when needed. Many strings come from sections in object files and won't need to have backing copies, but some do.
- When a function doesn't have a mangled name, store the fully qualified name by creating a string by traversing the parent decl context DIEs and then. If we don't do this, we end up having cases where some function might appear in the GSYM as "erase" instead of "std::vector<int>::erase".
- omit any functions whose address isn't in the optional TextRanges member variable of DwarfTransformer. This allows object file to register address ranges that are known valid code ranges and can help omit functions that should have been dead stripped, but just had their low PC values set to zero. In this case we have many functions that all appear at address zero and can omit these functions by making sure they fall into good address ranges on the object file. Many compilers do this when the DWARF has a DW_AT_low_pc with a DW_FORM_addr, and a DW_AT_high_pc with a DW_FORM_data4 as the offset from the low PC. In this case the linker can't write the same address to both the high and low PC since there is only a relocation for the DW_AT_low_pc, so many linkers tend to just zero it out.
Reviewers: aprantl, dblaikie, probinson
Subscribers: mgorny, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74450
This is how it should've been and brings it more in line with
std::string_view. There should be no functional change here.
This is mostly mechanical from a custom clang-tidy check, with a lot of
manual fixups. It uncovers a lot of minor inefficiencies.
This doesn't actually modify StringRef yet, I'll do that in a follow-up.
Summary:
Lookup functions are designed to not fully decode a FunctionInfo, LineTable or InlineInfo, they decode only what is needed into a LookupResult object. This allows lookups to avoid costly memory allocations and avoid parsing large amounts of information one a suitable match is found.
LookupResult objects contain the address that was looked up, the concrete function address range, the name of the concrete function, and a list of source locations. One for each inline function, and one for the concrete function. This allows one address to turn into multiple frames and improves the signal you get when symbolicating addresses in GSYM files.
Reviewers: labath, aprantl
Subscribers: mgorny, hiraditya, llvm-commits, lldb-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70993
This patch adds the ability to create GSYM files with GsymCreator, and read them with GsymReader. Full testing has been added for both new classes.
This patch differs from the original patch https://reviews.llvm.org/D53379 in that is uses a StringTableBuilder class from llvm instead of a custom version. Support for big and little endian files has been added. If the endianness matches the current host, we use efficient extraction for the header, address table and address info offset tables.
Differential Revision: https://reviews.llvm.org/D68744
llvm-svn: 374381
This patch adds the llvm::gsym::Header class which appears at the start of a stand alone GSYM file, or in the first bytes of the GSYM data in a GSYM section within a file. Added encode and decode methods with full error handling and full tests.
Differential Revision: https://reviews.llvm.org/D67666
llvm-svn: 372149
This patch adds encoding and decoding of the FunctionInfo objects along with full error handling and tests. Full details of the FunctionInfo encoding format appear in the FunctionInfo.h header file.
Differential Revision: https://reviews.llvm.org/D67506
llvm-svn: 372135
This patch adds the ability to create a gsym::LineTable object, populate it, encode and decode it and test all functionality.
The full format of the LineTable encoding is specified in the header file LineTable.h.
Differential Revision: https://reviews.llvm.org/D66602
llvm-svn: 371657
This patch adds the ability to encode and decode InlineInfo objects and adds test coverage. Error handling is introduced in the encoding and decoding which will be used from here on out for remaining patches.
Differential Revision: https://reviews.llvm.org/D66600
llvm-svn: 370936
The full GSYM patch started with: https://reviews.llvm.org/D53379
This patch add the ability to encode data using the new llvm::gsym::FileWriter class.
FileWriter is a simplified binary data writer class that doesn't require targets, target definitions, architectures, or require any other optional compile time libraries to be enabled via the build process. This class needs the ability to seek to different spots in the binary data that it produces to fix up offsets and sizes in GSYM data. It currently uses std::ostream over llvm::raw_ostream because llvm::raw_ostream doesn't support seeking which is required when encoding and decoding GSYM data.
AddressRange objects are encoded and decoded to be relative to a base address. This will be the FunctionInfo's start address if the AddressRange is directly contained in a FunctionInfo, or a base address of the containing parent AddressRange or AddressRanges. This allows address ranges to be efficiently encoded using ULEB128 encodings as we encode the offset and size of each range instead of full addresses. This also makes encoded addresses easy to relocate as we just need to relocate one base address.
Differential Revision: https://reviews.llvm.org/D63828
llvm-svn: 369587
Delete unnecessary getters of AddressRange.
Simplify AddressRange::size(): Start <= End check should be checked in an upper layer.
Delete isContiguousWith() that doesn't make sense.
Simplify AddressRanges::insert. Delete commented code. Fix it when more than 1 ranges are to be deleted.
Delete trailing newline.
llvm-svn: 364637
The full GSYM patch started with: https://reviews.llvm.org/D53379
In that patch we wanted to split up getting GSYM into the LLVM code base so we are not committing too much code at once.
This is a first in a series of patches where I only add the foundation classes along with complete unit tests. They provide the foundation for encoding and decoding a GSYM file.
File entries are defined in llvm::gsym::FileEntry. This class splits the file up into a directory and filename represented by uniqued string table offsets. This allows all files that are referred to in a GSYM file to be encoded as 1 based indexes into a global file table in the GSYM file.
Function information in stored in llvm::gsym::FunctionInfo. This object represents a contiguous address range that has a name and range with an optional line table and inline call stack information.
Line table entries are defined in llvm::gsym::LineEntry. They store only address, file and line information to keep the line tables simple and allows the information to be efficiently encoded in a subsequent patch.
Inline information is defined in llvm::gsym::InlineInfo. These structs store the name of the inline function, along with one or more address ranges, and the file and line that called this function. They also contain any child inline information.
There are also utility classes for address ranges in llvm::gsym::AddressRange, and string table support in llvm::gsym::StringTable which are simple classes.
The unit tests test all the APIs on these simple classes so they will be ready for the next patches where we will create GSYM files and parse GSYM files.
Differential Revision: https://reviews.llvm.org/D63104
llvm-svn: 364427