It previously only worked when the key and value types were
both 4 byte integers. We now have a use case for a non trivial
value type, so we need to extend it to support arbitrary value
types, which means templatizing it.
llvm-svn: 327647
Summary:
Some PDB symbols do not have a valid VA or RVA but have Addr by Section and Offset. For example, a variable in thread-local storage has the following properties:
get_addressOffset: 0
get_addressSection: 5
get_lexicalParentId: 2
get_name: g_tls
get_symIndexId: 12
get_typeId: 4
get_dataKind: 6
get_symTag: 7
get_locationType: 2
This change provides a new method to locate line numbers by Section and Offset from those symbols.
Reviewers: zturner, rnk, llvm-commits
Subscribers: asmith, JDevlieghere
Differential Revision: https://reviews.llvm.org/D44407
llvm-svn: 327601
Summary:
This patch replaces the two switches which are deducing the size of
various forms with a single implementation. I have put the new
implementation into BinaryFormat, to avoid introducing dependencies
between the two independent libraries (DebugInfo and CodeGen) that need
this functionality.
Reviewers: aprantl, JDevlieghere, dblaikie
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D44418
llvm-svn: 327486
Whilst working on improvements to the error handling of the debug line
parsing code, I noticed that if an invalid offset were to be specified
in a call to getOrParseLineTable(), an entry in the LineTableMap would
still be created, even if the offset was not within the section range.
The immediate parsing attempt afterwards would fail (it would end up
getting a version of 0), and thereafter, any subsequent calls to
getOrParseLineTable or getLineTable would return the default-
constructed, invalid line table. In reality, we shouldn't even attempt
to parse this table, and we should always return a nullptr from these
two functions for this situation.
I have tested this via a unit test, which required some new framework
for unit testing debug line. My plan is to add quite a few more unit
tests for the new error reporting mechanism that will follow shortly,
hence the reason why the supporting code for the tests are written the
way they are - I intend to extend the DwarfGenerator class to support
generating debug line. At that point, I'll make sure that there are a
few positive test cases for this and the parsing code too.
Differential Revision: https://reviews.llvm.org/D44200
Reviewers: JDevlieghere, aprantl
llvm-svn: 326995
Summary: This helps to determine the line number for a PDB type with definition
Reviewers: zturner, llvm-commits, rnk
Reviewed By: zturner
Subscribers: rengolin, JDevlieghere
Differential Revision: https://reviews.llvm.org/D44119
llvm-svn: 326857
This was originally reported as a bug with the symptom being "cvdump
crashes when printing an LLD-linked PDB that has an S_FILESTATIC record
in it". After some additional investigation, I determined that this was
a symptom of a larger problem, and in fact the real problem was in the
way we emitted the global PDB string table. As evidence of this, you can
take any lld-generated PDB, run cvdump -stringtable on it, and it would
return no results.
My hypothesis was that cvdump could not *find* the string table to begin
with. Normally it would do this by looking in the "named stream map",
finding the string /names, and using its value as the stream index. If
this lookup fails, then cvdump would fail to load the string table.
To test this hypothesis, I looked at the name stream map generated by a
link.exe PDB, and I emitted exactly those bytes into an LLD-generated
PDB. Suddenly, cvdump could read our string table!
This code has always been hacky and we knew there was something we
didn't understand. After all, there were some comments to the effect of
"we have to emit strings in a specific order, otherwise things don't
work". The key to fixing this was finally understanding this.
The way it works is that it makes use of a generic serializable hash map
that maps integers to other integers. In this case, the "key" is the
offset into a buffer, and the value is the stream number. If you index
into the buffer at the offset specified by a given key, you find the
name. The underlying cause of all these problems is that we were using
the identity function for the hash. i.e. if a string's offset in the
buffer was 12, the hash value was 12. Instead, we need to hash the
string *at that offset*. There is an additional catch, in that we have
to compute the hash as a uint32 and then truncate it to uint16.
Making this work is a little bit annoying, because we use the same hash
table in other places as well, and normally just using the identity
function for the hash function is actually what's desired. I'm not
totally happy with the template goo I came up with, but it works in any
case.
The reason we never found this bug through our own testing is because we
were building a /parallel/ hash table (in the form of an
llvm::StringMap<>) and doing all of our lookups and "real" hash table
work against that. I deleted all of that code and now everything goes
through the real hash table. Then, to test it, I added a unit test which
adds 7 strings and queries the associated values. I test every possible
insertion order permutation of these 7 strings, to verify that it really
does work as expected.
Differential Revision: https://reviews.llvm.org/D43326
llvm-svn: 325386
We have some code to try to determine how many pieces an MSF
Free Page Map is split into, and this code had an off by one
error which would cause the calculation to be incorrect when
there were exactly 4096*k + 1 blocks in an MSF file.
Original investigation and patch outline by Colden Cullen.
Differential Revision: https://reviews.llvm.org/D41742
llvm-svn: 321880
Currently it's not possible to access MCSubtargetInfo from a TgtMCAsmBackend.
D20830 threaded an MCSubtargetInfo reference through
MCAsmBackend::relaxInstruction, but this isn't the only function that would
benefit from access. This patch removes the Triple and CPUString arguments
from createMCAsmBackend and replaces them with MCSubtargetInfo.
This patch just changes the interface without making any intentional
functional changes. Once in, several cleanups are possible:
* Get rid of the awkward MCSubtargetInfo handling in ARMAsmBackend
* Support 16-bit instructions when valid in MipsAsmBackend::writeNopData
* Get rid of the CPU string parsing in X86AsmBackend and just use a SubtargetFeature for HasNopl
* Emit 16-bit nops in RISCVAsmBackend::writeNopData if the compressed instruction set extension is enabled (see D41221)
This change initially exposed PR35686, which has since been resolved in r321026.
Differential Revision: https://reviews.llvm.org/D41349
llvm-svn: 321692
Adds missing support for DW_FORM_data16.
Update of r320852/r320886, fixing the unittest again, this time use a
raw char string for the test data.
Differential Revision: https://reviews.llvm.org/D41090
llvm-svn: 321011
Adds missing support for DW_FORM_data16.
Update of r320852, fixing the unittest to use a hand-coded struct
instead of std::array to guarantee data layout.
Differential Revision: https://reviews.llvm.org/D41090
llvm-svn: 320886
Previously, when linking against libcmt from the MSVC runtime,
lld-link /verbose would show "Ignoring unknown symbol record
with kind 0x1006". It turns out this was because
TypeIndexDiscovery did not handle S_REGISTER records, so these
records were not getting properly remapped.
Patch by: Alexnadre Ganea
Differential Revision: https://reviews.llvm.org/D40919
llvm-svn: 320108
Currently nothing uses this, but this at least gets the core
algorithm in, and adds some test to demonstrate correctness.
Differential Revision: https://reviews.llvm.org/D40736
llvm-svn: 319854
We currently use target_link_libraries without an explicit scope
specifier (INTERFACE, PRIVATE or PUBLIC) when linking executables.
Dependencies added in this way apply to both the target and its
dependencies, i.e. they become part of the executable's link interface
and are transitive.
Transitive dependencies generally don't make sense for executables,
since you wouldn't normally be linking against an executable. This also
causes issues for generating install export files when using
LLVM_DISTRIBUTION_COMPONENTS. For example, clang has a lot of LLVM
library dependencies, which are currently added as interface
dependencies. If clang is in the distribution components but the LLVM
libraries it depends on aren't (which is a perfectly legitimate use case
if the LLVM libraries are being built static and there are therefore no
run-time dependencies on them), CMake will complain about the LLVM
libraries not being in export set when attempting to generate the
install export file for clang. This is reasonable behavior on CMake's
part, and the right thing is for LLVM's build system to explicitly use
PRIVATE dependencies for executables.
Unfortunately, CMake doesn't allow you to mix and match the keyword and
non-keyword target_link_libraries signatures for a single target; i.e.,
if a single call to target_link_libraries for a particular target uses
one of the INTERFACE, PRIVATE, or PUBLIC keywords, all other calls must
also be updated to use those keywords. This means we must do this change
in a single shot. I also fully expect to have missed some instances; I
tested by enabling all the projects in the monorepo (except dragonegg),
and configuring both with and without shared libraries, on both Darwin
and Linux, but I'm planning to rely on the buildbots for other
configurations (since it should be pretty easy to fix those).
Even after this change, we still have a lot of target_link_libraries
calls that don't specify a scope keyword, mostly for shared libraries.
I'm thinking about addressing those in a follow-up, but that's a
separate change IMO.
Differential Revision: https://reviews.llvm.org/D40823
llvm-svn: 319840
The motivation behind this patch is that future directions require us to
be able to compute the hash value of records independently of actually
using them for de-duplication.
The current structure of TypeSerializer / TypeTableBuilder being a
single entry point that takes an unserialized type record, and then
hashes and de-duplicates it is not flexible enough to allow this.
At the same time, the existing TypeSerializer is already extremely
complex for this very reason -- it tries to be too many things. In
addition to serializing, hashing, and de-duplicating, ti also supports
splitting up field list records and adding continuations. All of this
functionality crammed into this one class makes it very complicated to
work with and hard to maintain.
To solve all of these problems, I've re-written everything from scratch
and split the functionality into separate pieces that can easily be
reused. The end result is that one class TypeSerializer is turned into 3
new classes SimpleTypeSerializer, ContinuationRecordBuilder, and
TypeTableBuilder, each of which in isolation is simple and
straightforward.
A quick summary of these new classes and their responsibilities are:
- SimpleTypeSerializer : Turns a non-FieldList leaf type into a series of
bytes. Does not do any hashing. Every time you call it, it will
re-serialize and return bytes again. The same instance can be re-used
over and over to avoid re-allocations, and in exchange for this
optimization the bytes returned by the serializer only live until the
caller attempts to serialize a new record.
- ContinuationRecordBuilder : Turns a FieldList-like record into a series
of fragments. Does not do any hashing. Like SimpleTypeSerializer,
returns references to privately owned bytes, so the storage is
invalidated as soon as the caller tries to re-use the instance. Works
equally well for LF_FIELDLIST as it does for LF_METHODLIST, solving a
long-standing theoretical limitation of the previous implementation.
- TypeTableBuilder : Accepts sequences of bytes that the user has already
serialized, and inserts them by de-duplicating with a hash table. For
the sake of convenience and efficiency, this class internally stores a
SimpleTypeSerializer so that it can accept unserialized records. The
same is not true of ContinuationRecordBuilder. The user is required to
create their own instance of ContinuationRecordBuilder.
Differential Revision: https://reviews.llvm.org/D40518
llvm-svn: 319198
The previous implementation would only look 1 DW_AT_specification or DW_AT_abstract_origin deep. This means DWARFDie::getName() would fail in certain cases. I ran into such a case while creating a tool that used the LLVM DWARF parser to generate a symbolication format so I have seen this in the wild.
Differential Revision: https://reviews.llvm.org/D40156
llvm-svn: 319104
The existing library assumed that a stream's length would never
change. This makes some things simpler, but it's not flexible
enough for what we need, especially for writable streams where
what you really want is for each call to write to actually append.
llvm-svn: 319070
Initial changes to support debugging PE/COFF files with LLDB on Windows through DIA SDK.
There is another set of changes required on the LLDB side before this does anything.
Differential Revision: https://reviews.llvm.org/D39517
llvm-svn: 318403
This adds type index discovery and dumper support for symbol record kind
0x1168, which is a list of inlined function ids. This symbol kind is
undocumented, but S_INLINEES is consistent with the existing
nomenclature.
Fixes PR34222
llvm-svn: 316398
MCObjectStreamer owns its MCCodeEmitter -- this fixes the types to reflect that,
and allows us to remove the last instance of MCObjectStreamer's weird "holding
ownership via someone else's reference" trick.
llvm-svn: 315531
MCObjectStreamer owns its MCAsmBackend -- this fixes the types to reflect that,
and allows us to remove another instance of MCObjectStreamer's weird "holding
ownership via someone else's reference" trick.
llvm-svn: 315410
This patch makes the `.eh_frame` extension an alias for `.debug_frame`.
Up till now it was only possible to dump the section using objdump, but
not with dwarfdump. Since the two are essentially interchangeable, we
dump whichever of the two is present.
As a workaround, this patch also adds parsing for 3 currently
unimplemented CFA instructions: `DW_CFA_def_cfa_expression`,
`DW_CFA_expression`, and `DW_CFA_val_expression`. Because I lack the
required knowledge, I just parse the fields without actually creating
the instructions.
Finally, this also fixes the typo in the `.debug_frame` section name
which incorrectly contained a trailing `s`.
Differential revision: https://reviews.llvm.org/D37852
llvm-svn: 313530
This patch started as an attempt to rebase Greg's differential (D32821).
The result is both quite similar and different at the same time. It adds
the following checks:
- Verify that all address ranges in a DIE are valid.
- Verify that no ranges within the DIE overlap.
- Verify that no ranges overlap with the ranges of a sibling.
- Verify that children are completely contained in its (direct)
parent's address range. (unless both are subprograms)
Differential revision: https://reviews.llvm.org/D37696
llvm-svn: 313255
This patch started as an attempt to rebase Greg's differential (D32821).
The result is both quite similar and different at the same time. It adds
the following checks:
- Verify that all address ranges in a DIE are valid.
- Verify that no ranges within the DIE overlap.
- Verify that no ranges overlap with the ranges of a sibling.
- Verify that children are completely contained in its (direct)
parent's address range. (unless both are subprograms)
Differential revision: https://reviews.llvm.org/D37696
llvm-svn: 313250
This patch adds prologue verification, which is already present in
Apple's dwarfdump. It checks for invalid directory indices and warns
about duplicate file paths.
Differential revision: https://reviews.llvm.org/D37511
llvm-svn: 312782
The compiler outputs PROC32_ID symbols into the object files
for functions, and these symbols have an embedded type index
which, when copied to the PDB, refer to the IPI stream. However,
the symbols themselves are also converted into regular symbols
(e.g. S_GPROC32_ID -> S_GPROC32), and type indices in the regular
symbol records refer to the TPI stream. So this patch applies
two fixes to function records.
1. It converts ID symbols to the proper non-ID record type.
2. After remapping the type index from the object file's index
space to the PDB file/IPI stream's index space, it then
remaps that index to the TPI stream's index space by.
Besides functions, during the remapping process we were also
discarding symbol record types which we did not recognize.
In particular, we were discarding S_BPREL32 records, which is
what MSVC uses to describe local variables on the stack. So
this patch fixes that as well by copying them to the PDB.
Differential Revision: https://reviews.llvm.org/D36426
llvm-svn: 310394
The PDB reserves certain blocks for the FPM that describe which
blocks in the file are allocated and which are free. We weren't
filling that out at all, and in some cases we were even stomping
it with incorrect data. This patch writes a correct FPM.
Differential Revision: https://reviews.llvm.org/D36235
llvm-svn: 309896
Recently problems have been discovered in the way we write the FPM
(free page map). In order to fix this, we first need to establish
a baseline about what a correct FPM looks like using an MSVC
generated PDB, so that we can then make our own generated PDBs
match. And in order to do this, the dumper needs a mode where it
can dump an FPM so that we can write tests for it.
This patch adds a command to dump the FPM, as well as a test against
a known-good PDB.
llvm-svn: 309894
I was surprised to see the code model being passed to MC. After all,
it assembles code, it doesn't create it.
The one place it is used is in the expansion of .cfi directives to
handle .eh_frame being more that 2gb away from the code.
As far as I can tell, gnu assembler doesn't even have an option to
enable this. Compiling a c file with gcc -mcmodel=large produces a
regular looking .eh_frame. This is probably because in practice linker
parse and recreate .eh_frames.
In llvm this is used because the JIT can place the code and .eh_frame
very far apart. Ideally we would fix the jit and delete this
option. This is hard.
Apart from confusion another problem with the current interface is
that most callers pass CodeModel::Default, which is bad since MC has
no way to map it to the target default if it actually needed to.
This patch then replaces the argument with a boolean with a default
value. The vast majority of users don't ever need to look at it. In
fact, only CodeGen and llvm-mc use it and llvm-mc just to enable more
testing.
llvm-svn: 309884
This changes DwarfContext to delegate to DwarfObject instead of having
pure virtual methods.
With this DwarfContextInMemory is replaced with an implementation of
DwarfObject that is local to a .cpp file.
llvm-svn: 308543
Summary:
Instead of wiring these through the CVTypeVisitor interface, clients
should inspect the CVTypeArray before visiting it and potentially load
up the type server's TPI stream if they need it.
No tests relied on this functionality because LLD was the only client.
Reviewers: ruiu
Subscribers: mgorny, hiraditya, zturner, llvm-commits
Differential Revision: https://reviews.llvm.org/D35394
llvm-svn: 308212
This fixes a cmake configuration issue when LLVM is configured with no targets.
Instead we need to add TestingSupport directly with target_link_libraries.
llvm-svn: 306842
Current implementation looks a bit confusing. It looks like it should
report/print something on error, but it does not do that.
It silently drops a error message when creating triple, though
this behavior is fine generally.
For example if LLVM configured with -DLLVM_TARGETS_TO_BUILD=ARM and
our host is windows, it is expected that we will be unable to
create "i386-pc-windows-msvc" target.
Patch introduces isConfigurationSupported() function that checks
if current configuration is supported for each test and returns early if not.
llvm-svn: 306812