diff --git a/docs/PDB/DbiStream.rst b/docs/PDB/DbiStream.rst index 0a247a13c05..fec0e29ae53 100644 --- a/docs/PDB/DbiStream.rst +++ b/docs/PDB/DbiStream.rst @@ -1,3 +1,445 @@ -===================================== -The PDB DBI (Debug Info) Stream -===================================== +===================================== +The PDB DBI (Debug Info) Stream +===================================== + +.. contents:: + :local: + +.. _dbi_intro: + +Introduction +============ + +The PDB DBI Stream (Index 3) is one of the largest and most important streams +in a PDB file. It contains information about how the program was compiled, +(e.g. compilation flags, etc), the compilands (e.g. object files) that +were used to link together the program, the source files which were used +to build the program, as well as references to other streams that contain more +detailed information about each compiland, such as the CodeView symbol records +contained within each compiland and the source and line information for +functions and other symbols within each compiland. + + +.. _dbi_header: + +Stream Header +============= +At offset 0 of the DBI Stream is a header with the following layout: + + +.. code-block:: c++ + + struct DbiStreamHeader { + int32_t VersionSignature; + uint32_t VersionHeader; + uint32_t Age; + uint16_t GlobalStreamIndex; + uint16_t BuildNumber; + uint16_t PublicStreamIndex; + uint16_t PdbDllVersion; + uint16_t SymRecordStream; + uint16_t PdbDllRbld; + int32_t ModInfoSize; + int32_t SectionContributionSize; + int32_t SectionMapSize; + int32_t SourceInfoSize; + int32_t TypeServerSize; + uint32_t MFCTypeServerIndex; + int32_t OptionalDbgHeaderSize; + int32_t ECSubstreamSize; + uint16_t Flags; + uint16_t Machine; + uint32_t Padding; + }; + +- **VersionSignature** - Unknown meaning. Appears to always be ``-1``. + +- **VersionHeader** - A value from the following enum. + +.. code-block:: c++ + + enum class DbiStreamVersion : uint32_t { + VC41 = 930803, + V50 = 19960307, + V60 = 19970606, + V70 = 19990903, + V110 = 20091201 + }; + +Similar to the :doc:`PDB Stream `, this value always appears to be +``V70``, and it is not clear what the other values are for. + +- **Age** - The number of times the PDB has been written. Equal to the same + field from the :ref:`PDB Stream header `. + +- **GlobalStreamIndex** - The index of the :doc:`Global Symbol Stream `, + which contains CodeView symbol records for all global symbols. Actual records + are stored in the symbol record stream, and are referenced from this stream. + +- **BuildNumber** - A bitfield containing values representing the major and minor + version number of the toolchain (e.g. 12.0 for MSVC 2013) used to build the + program, with the following layout: + +.. code-block:: c++ + + uint16_t MinorVersion : 8; + uint16_t MajorVersion : 7; + uint16_t NewVersionFormat : 1; + +For the purposes of LLVM, we assume ``NewVersionFormat`` to be always ``true``. +If it is ``false``, the layout above does not apply and the reader should consult +the `Microsoft Source Code `__ for +further guidance. + +- **PublicStreamIndex** - The index of the :doc:`Public Symbol Stream `, + which contains CodeView symbol records for all public symbols. Actual records + are stored in the symbol record stream, and are referenced from this stream. + +- **PdbDllVersion** - The version number of ``mspdbXXXX.dll`` used to produce this + PDB. Note this obviously does not apply for LLVM as LLVM does not use ``mspdb.dll``. + +- **SymRecordStream** - The stream containing all CodeView symbol records used + by the program. This is used for deduplication, so that many different + compilands can refer to the same symbols without having to include the full record + content inside of each module stream. + +- **PdbDllRbld** - Unknown + +- **MFCTypeServerIndex** - The length of the :ref:dbi_mfc_type_server_substream + +- **Flags** - A bitfield with the following layout, containing various + information about how the program was built: + +.. code-block:: c++ + + uint16_t WasIncrementallyLinked : 1; + uint16_t ArePrivateSymbolsStripped : 1; + uint16_t HasConflictingTypes : 1; + uint16_t Reserved : 13; + +The only one of these that is not self-explanatory is ``HasConflictingTypes``. +Although undocumented, ``link.exe`` contains a hidden flag ``/DEBUG:CTYPES``. +If it is passed to ``link.exe``, this field will be set. Otherwise it will +not be set. It is unclear what this flag does, although it seems to have +subtle implications on the algorithm used to look up type records. + +- **Machine** - A value from the `CV_CPU_TYPE_e `__ + enumeration. Common values are ``0x8664`` (x86-64) and ``0x14C`` (x86). + +Immediately after the fixed-size DBI Stream header are ``7`` variable-length +`substreams`. The following ``7`` fields of the DBI Stream header specify the +number of bytes of the corresponding substream. Each substream's contents will +be described in detail :ref:`below `. The length of the entire +DBI Stream should equal ``64`` (the length of the header above) plus the value +of each of the following ``7`` fields. + +- **ModInfoSize** - The length of the :ref:`dbi_mod_info_substream`. + +- **SectionContributionSize** - The length of the :ref:`dbi_sec_contr_substream`. + +- **SectionMapSize** - The length of the :ref:`dbi_section_map_substream`. + +- **SourceInfoSize** - The length of the :ref:`dbi_file_info_substream`. + +- **TypeServerSize** - The length of the :ref:`dbi_type_server_substream`. + +- **OptionalDbgHeaderSize** - The length of the :ref:`dbi_optional_dbg_stream`. + +- **ECSubstreamSize** - The length of the :ref:`dbi_ec_substream`. + +.. _dbi_substreams: + +Substreams +========== + +.. _dbi_mod_info_substream: + +Module Info Substream +^^^^^^^^^^^^^^^^^^^^^ + +Begins at offset ``0`` immediately after the :ref:`header `. The +module info substream is an array of variable-length records, each one +describing a single module (e.g. object file) linked into the program. Each +record in the array has the format: + +.. code-block:: c++ + + struct SectionContribEntry { + uint16_t Section; + char Padding1[2]; + int32_t Offset; + int32_t Size; + uint32_t Characteristics; + uint16_t ModuleIndex; + char Padding2[2]; + uint32_t DataCrc; + uint32_t RelocCrc; + }; + +While most of these are self-explanatory, the ``Characteristics`` field +warrants some elaboration. It corresponds to the ``Characteristics`` +field of the `IMAGE_SECTION_HEADER `__ +structure. + +.. code-block:: c++ + + struct ModInfo { + uint32_t Unused1; + SectionContribEntry SectionContr; + uint16_t Flags; + uint16_t ModuleSymStream; + uint32_t SymByteSize; + uint32_t C11ByteSize; + uint32_t C13ByteSize; + uint16_t SourceFileCount; + char Padding[2]; + uint32_t Unused2; + uint32_t SourceFileNameIndex; + uint32_t PdbFilePathNameIndex; + char ModuleName[]; + char ObjFileName[]; + }; + +- **SectionContr** - Describes the properties of the section in the final binary + which contain the code and data from this module. + +- **Flags** - A bitfield with the following format: + +.. code-block:: c++ + + uint16_t Dirty : 1; // ``true`` if this ModInfo has been written since reading the PDB. + uint16_t EC : 1; // ``true`` if EC information is present for this module. It is unknown what EC actually is. + uint16_t Unused : 6; + uint16_t TSM : 8; // Type Server Index for this module. It is unknown what this is used for, but it is not used by LLVM. + + +- **ModuleSymStream** - The index of the stream that contains symbol information + for this module. This includes CodeView symbol information as well as source + and line information. + +- **SymByteSize** - The number of bytes of data from the stream identified by + ``ModuleSymStream`` that represent CodeView symbol records. + +- **C11ByteSize** - The number of bytes of data from the stream identified by + ``ModuleSymStream`` that represent C11-style CodeView line information. + +- **C13ByteSize** - The number of bytes of data from the stream identified by + ``ModuleSymStream`` that represent C13-style CodeView line information. At + most one of ``C11ByteSize`` and ``C13ByteSize`` will be non-zero. + +- **SourceFileCount** - The number of source files that contributed to this + module during compilation. + +- **SourceFileNameIndex** - The offset in the names buffer of the primary + translation unit used to build this module. All PDB files observed to date + always have this value equal to 0. + +- **PdbFilePathNameIndex** - The offset in the names buffer of the PDB file + containing this module's symbol information. This has only been observed + to be non-zero for the special ``* Linker *`` module. + +- **ModuleName** - The module name. This is usually either a full path to an + object file (either directly passed to ``link.exe`` or from an archive) or + a string of the form ``Import:``. + +- **ObjFileName** - The object file name. In the case of an module that is + linked directly passed to ``link.exe``, this is the same as **ModuleName**. + In the case of a module that comes from an archive, this is usually the full + path to the archive. + +.. _dbi_sec_contr_substream: + +Section Contribution Substream +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Begins at offset ``0`` immediately after the :ref:`dbi_mod_info_substream` ends, +and consumes ``Header->SectionContributionSize`` bytes. This substream begins +with a single ``uint32_t`` which will be one of the following values: + +.. code-block:: c++ + + enum class SectionContrSubstreamVersion : uint32_t { + Ver60 = 0xeffe0000 + 19970605, + V2 = 0xeffe0000 + 20140516 + }; + +``Ver60`` is the only value which has been observed in a PDB so far. Following +this ``4`` byte field is an array of fixed-length structures. If the version +is ``Ver60``, it is an array of ``SectionContribEntry`` structures. If the +version is ``V2``, it is an array of ``SectionContribEntry2`` structures, +defined as follows: + +.. code-block:: c++ + + struct SectionContribEntry2 { + SectionContribEntry SC; + uint32_t ISectCoff; + }; + +The purpose of the second field is not well understood. + + +.. _dbi_section_map_substream: + +Section Map Substream +^^^^^^^^^^^^^^^^^^^^^ +Begins at offset ``0`` immediately after the :ref:`dbi_sec_contr_substream` ends, +and consumes ``Header->SectionMapSize`` bytes. This substream begins with an ``8`` +byte header followed by an array of fixed-length records. The header and records +have the following layout: + +.. code-block:: c++ + + struct SectionMapHeader { + uint16_t Count; // Number of segment descriptors + uint16_t LogCount; // Number of logical segment descriptors + }; + + struct SectionMapEntry { + uint16_t Flags; // See the SectionMapEntryFlags enum below. + uint16_t Ovl; // Logical overlay number + uint16_t Group; // Group index into descriptor array. + uint16_t Frame; + uint16_t SectionName; // Byte index of segment / group name in string table, or 0xFFFF. + uint16_t ClassName; // Byte index of class in string table, or 0xFFFF. + uint32_t Offset; // Byte offset of the logical segment within physical segment. If group is set in flags, this is the offset of the group. + uint32_t SectionLength; // Byte count of the segment or group. + }; + + enum class SectionMapEntryFlags : uint16_t { + Read = 1 << 0, // Segment is readable. + Write = 1 << 1, // Segment is writable. + Execute = 1 << 2, // Segment is executable. + AddressIs32Bit = 1 << 3, // Descriptor describes a 32-bit linear address. + IsSelector = 1 << 8, // Frame represents a selector. + IsAbsoluteAddress = 1 << 9, // Frame represents an absolute address. + IsGroup = 1 << 10 // If set, descriptor represents a group. + }; + +Many of these fields are not well understood, so will not be discussed further. + +.. _dbi_file_info_substream: + +File Info Substream +^^^^^^^^^^^^^^^^^^^ +Begins at offset ``0`` immediately after the :ref:`dbi_section_map_substream` ends, +and consumes ``Header->SourceInfoSize`` bytes. This substream defines the mapping +from module to the source files that contribute to that module. Since multiple +modules can use the same source file (for example, a header file), this substream +uses a string table to store each unique file name only once, and then have each +module use offsets into the string table rather than embedding the string's value +directly. The format of this substream is as follows: + +.. code-block:: c++ + + struct FileInfoSubstream { + uint16_t NumModules; + uint16_t NumSourceFiles; + + uint16_t ModIndices[NumModules]; + uint16_t ModFileCounts[NumModules]; + uint32_t FileNameOffsets[NumSourceFiles]; + char NamesBuffer[][NumSourceFiles]; + }; + +**NumModules** - The number of modules for which source file information is +contained within this substream. Should match the corresponding value from the +ref:`dbi_header`. + +**NumSourceFiles**: In theory this is supposed to contain the number of source +files for which this substream contains information. But that would present a +problem in that the width of this field being ``16``-bits would prevent one from +having more than 64K source files in a program. In early versions of the file +format, this seems to have been the case. In order to support more than this, this +field of the is simply ignored, and computed dynamically by summing up the values of +the ``ModFileCounts`` array (discussed below). In short, this value should be +ignored. + +**ModIndices** - This array is present, but does not appear to be useful. + +**ModFileCountArray** - An array of ``NumModules`` integers, each one containing +the number of source files which contribute to the module at the specified index. +While each individual module is limited to 64K contributing source files, the +union of all modules' source files may be greater than 64K. The real number of +source files is thus computed by summing this array. Note that summing this array +does not give the number of `unique` source files, only the total number of source +file contributions to modules. + +**FileNameOffsets** - An array of **NumSourceFiles** integers (where **NumSourceFiles** +here refers to the 32-bit value obtained from summing **ModFileCountArray**), where +each integer is an offset into **NamesBuffer** pointing to a null terminated string. + +**NamesBuffer** - An array of null terminated strings containing the actual source +file names. + +.. _dbi_type_server_substream: + +Type Server Substream +^^^^^^^^^^^^^^^^^^^^^ +Begins at offset ``0`` immediately after the :ref:`dbi_file_info_substream` ends, +and consumes ``Header->TypeServerSize`` bytes. Neither the purpose nor the layout +of this substream is understood, although it is assumed to related somehow to the +usage of ``/Zi`` and ``mspdbsrv.exe``. This substream will not be discussed further. + +.. _dbi_ec_substream: + +EC Substream +^^^^^^^^^^^^ +Begins at offset ``0`` immediately after the :ref:`dbi_type_server_substream` ends, +and consumes ``Header->ECSubstreamSize`` bytes. Neither the purpose nor the layout +of this substream is understood, and it will not be discussed further. + +.. _dbi_optional_dbg_stream: + +Optional Debug Header Stream +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Begins at offset ``0`` immediately after the :ref:`dbi_ec_substream` ends, and +consumes ``Header->OptionalDbgHeaderSize`` bytes. This field is an array of +stream indices (e.g. ``uint16_t``'s), each of which identifies a stream +index in the larger MSF file which contains some additional debug information. +Each position of this array has a special meaning, allowing one to determine +what kind of debug information is at the referenced stream. ``11`` indices +are currently understood, although it's possible there may be more. The +layout of each stream generally corresponds exactly to a particular type +of debug data directory from the PE/COFF file. The format of these fields +can be found in the `Microsoft PE/COFF Specification `__. + +**FPO Data** - ``DbgStreamArray[0]``. The data in the referenced stream is a +debug data directory of type ``IMAGE_DEBUG_TYPE_FPO`` + +**Exception Data** - ``DbgStreamArray[1]``. The data in the referenced stream +is a debug data directory of type ``IMAGE_DEBUG_TYPE_EXCEPTION``. + +**Fixup Data** - ``DbgStreamArray[2]``. The data in the referenced stream is a +debug data directory of type ``IMAGE_DEBUG_TYPE_FIXUP``. + +**Omap To Src Data** - ``DbgStreamArray[3]``. The data in the referenced stream +is a debug data directory of type ``IMAGE_DEBUG_TYPE_OMAP_TO_SRC``. This +is used for mapping addresses between instrumented and uninstrumented code. + +**Omap From Src Data** - ``DbgStreamArray[4]``. The data in the referenced stream +is a debug data directory of type ``IMAGE_DEBUG_TYPE_OMAP_FROM_SRC``. This +is used for mapping addresses between instrumented and uninstrumented code. + +**Section Header Data** - ``DbgStreamArray[5]``. A dump of all section headers from +the original executable. + +**Token / RID Map** - ``DbgStreamArray[6]``. The layout of this stream is not +understood, but it is assumed to be a mapping from ``CLR Token`` to +``CLR Record ID``. Refer to `ECMA 335 `__ +for more information. + +**Xdata** - ``DbgStreamArray[7]``. A copy of the ``.xdata`` section from the +executable. + +**Pdata** - ``DbgStreamArray[8]``. This is assumed to be a copy of the ``.pdata`` +section from the executable, but that would make it identical to +``DbgStreamArray[1]``. The difference between these two indices is not well +understood. + +**New FPO Data** - ``DbgStreamArray[9]``. The data in the referenced stream is a +debug data directory of type ``IMAGE_DEBUG_TYPE_FPO``. It is not clear how this +differs from ``DbgStreamArray[0]``, but in practice all observed PDB files have +used the "new" format rather than the "old" format. + +**Original Section Header Data** - ``DbgStreamArray[10]``. Assumed to be similar +to ``DbgStreamArray[5]``, but has not been observed in practice. diff --git a/docs/PDB/index.rst b/docs/PDB/index.rst index 2c1c3e38f3f..95328cfc5a4 100644 --- a/docs/PDB/index.rst +++ b/docs/PDB/index.rst @@ -37,6 +37,11 @@ repo `__. File Layout =========== +.. important:: + Unless otherwise specified, all numeric values are encoded in little endian. + If you see a type such as ``uint16_t`` or ``uint64_t`` going forward, always + assume it is little endian! + .. toctree:: :hidden: