| ===================================== |
| The PDB DBI (Debug Info) Stream |
| ===================================== |
| |
| .. contents:: |
| :local: |
| |
| .. _dbi_intro: |
| |
| Introduction |
| ============ |
| |
| The PDB DBI Stream (Index 3) is one of the largest and most important streams |
| in a PDB file. It contains information about how the program was compiled, |
| (e.g. compilation flags, etc), the compilands (e.g. object files) that |
| were used to link together the program, the source files which were used |
| to build the program, as well as references to other streams that contain more |
| detailed information about each compiland, such as the CodeView symbol records |
| contained within each compiland and the source and line information for |
| functions and other symbols within each compiland. |
| |
| |
| .. _dbi_header: |
| |
| Stream Header |
| ============= |
| At offset 0 of the DBI Stream is a header with the following layout: |
| |
| |
| .. code-block:: c++ |
| |
| struct DbiStreamHeader { |
| int32_t VersionSignature; |
| uint32_t VersionHeader; |
| uint32_t Age; |
| uint16_t GlobalStreamIndex; |
| uint16_t BuildNumber; |
| uint16_t PublicStreamIndex; |
| uint16_t PdbDllVersion; |
| uint16_t SymRecordStream; |
| uint16_t PdbDllRbld; |
| int32_t ModInfoSize; |
| int32_t SectionContributionSize; |
| int32_t SectionMapSize; |
| int32_t SourceInfoSize; |
| int32_t TypeServerSize; |
| uint32_t MFCTypeServerIndex; |
| int32_t OptionalDbgHeaderSize; |
| int32_t ECSubstreamSize; |
| uint16_t Flags; |
| uint16_t Machine; |
| uint32_t Padding; |
| }; |
| |
| - **VersionSignature** - Unknown meaning. Appears to always be ``-1``. |
| |
| - **VersionHeader** - A value from the following enum. |
| |
| .. code-block:: c++ |
| |
| enum class DbiStreamVersion : uint32_t { |
| VC41 = 930803, |
| V50 = 19960307, |
| V60 = 19970606, |
| V70 = 19990903, |
| V110 = 20091201 |
| }; |
| |
| Similar to the :doc:`PDB Stream <PdbStream>`, this value always appears to be |
| ``V70``, and it is not clear what the other values are for. |
| |
| - **Age** - The number of times the PDB has been written. Equal to the same |
| field from the :ref:`PDB Stream header <pdb_stream_header>`. |
| |
| - **GlobalStreamIndex** - The index of the :doc:`Global Symbol Stream <GlobalStream>`, |
| which contains CodeView symbol records for all global symbols. Actual records |
| are stored in the symbol record stream, and are referenced from this stream. |
| |
| - **BuildNumber** - A bitfield containing values representing the major and minor |
| version number of the toolchain (e.g. 12.0 for MSVC 2013) used to build the |
| program, with the following layout: |
| |
| .. code-block:: c++ |
| |
| uint16_t MinorVersion : 8; |
| uint16_t MajorVersion : 7; |
| uint16_t NewVersionFormat : 1; |
| |
| For the purposes of LLVM, we assume ``NewVersionFormat`` to be always ``true``. |
| If it is ``false``, the layout above does not apply and the reader should consult |
| the `Microsoft Source Code <https://github.com/Microsoft/microsoft-pdb>`__ for |
| further guidance. |
| |
| - **PublicStreamIndex** - The index of the :doc:`Public Symbol Stream <PublicStream>`, |
| which contains CodeView symbol records for all public symbols. Actual records |
| are stored in the symbol record stream, and are referenced from this stream. |
| |
| - **PdbDllVersion** - The version number of ``mspdbXXXX.dll`` used to produce this |
| PDB. Note this obviously does not apply for LLVM as LLVM does not use ``mspdb.dll``. |
| |
| - **SymRecordStream** - The stream containing all CodeView symbol records used |
| by the program. This is used for deduplication, so that many different |
| compilands can refer to the same symbols without having to include the full record |
| content inside of each module stream. |
| |
| - **PdbDllRbld** - Unknown |
| |
| - **MFCTypeServerIndex** - The length of the :ref:dbi_mfc_type_server_substream |
| |
| - **Flags** - A bitfield with the following layout, containing various |
| information about how the program was built: |
| |
| .. code-block:: c++ |
| |
| uint16_t WasIncrementallyLinked : 1; |
| uint16_t ArePrivateSymbolsStripped : 1; |
| uint16_t HasConflictingTypes : 1; |
| uint16_t Reserved : 13; |
| |
| The only one of these that is not self-explanatory is ``HasConflictingTypes``. |
| Although undocumented, ``link.exe`` contains a hidden flag ``/DEBUG:CTYPES``. |
| If it is passed to ``link.exe``, this field will be set. Otherwise it will |
| not be set. It is unclear what this flag does, although it seems to have |
| subtle implications on the algorithm used to look up type records. |
| |
| - **Machine** - A value from the `CV_CPU_TYPE_e <https://msdn.microsoft.com/en-us/library/b2fc64ek.aspx>`__ |
| enumeration. Common values are ``0x8664`` (x86-64) and ``0x14C`` (x86). |
| |
| Immediately after the fixed-size DBI Stream header are ``7`` variable-length |
| `substreams`. The following ``7`` fields of the DBI Stream header specify the |
| number of bytes of the corresponding substream. Each substream's contents will |
| be described in detail :ref:`below <dbi_substreams>`. The length of the entire |
| DBI Stream should equal ``64`` (the length of the header above) plus the value |
| of each of the following ``7`` fields. |
| |
| - **ModInfoSize** - The length of the :ref:`dbi_mod_info_substream`. |
| |
| - **SectionContributionSize** - The length of the :ref:`dbi_sec_contr_substream`. |
| |
| - **SectionMapSize** - The length of the :ref:`dbi_section_map_substream`. |
| |
| - **SourceInfoSize** - The length of the :ref:`dbi_file_info_substream`. |
| |
| - **TypeServerSize** - The length of the :ref:`dbi_type_server_substream`. |
| |
| - **OptionalDbgHeaderSize** - The length of the :ref:`dbi_optional_dbg_stream`. |
| |
| - **ECSubstreamSize** - The length of the :ref:`dbi_ec_substream`. |
| |
| .. _dbi_substreams: |
| |
| Substreams |
| ========== |
| |
| .. _dbi_mod_info_substream: |
| |
| Module Info Substream |
| ^^^^^^^^^^^^^^^^^^^^^ |
| |
| Begins at offset ``0`` immediately after the :ref:`header <dbi_header>`. The |
| module info substream is an array of variable-length records, each one |
| describing a single module (e.g. object file) linked into the program. Each |
| record in the array has the format: |
| |
| .. code-block:: c++ |
| |
| struct SectionContribEntry { |
| uint16_t Section; |
| char Padding1[2]; |
| int32_t Offset; |
| int32_t Size; |
| uint32_t Characteristics; |
| uint16_t ModuleIndex; |
| char Padding2[2]; |
| uint32_t DataCrc; |
| uint32_t RelocCrc; |
| }; |
| |
| While most of these are self-explanatory, the ``Characteristics`` field |
| warrants some elaboration. It corresponds to the ``Characteristics`` |
| field of the `IMAGE_SECTION_HEADER <https://msdn.microsoft.com/en-us/library/windows/desktop/ms680341(v=vs.85).aspx>`__ |
| structure. |
| |
| .. code-block:: c++ |
| |
| struct ModInfo { |
| uint32_t Unused1; |
| SectionContribEntry SectionContr; |
| uint16_t Flags; |
| uint16_t ModuleSymStream; |
| uint32_t SymByteSize; |
| uint32_t C11ByteSize; |
| uint32_t C13ByteSize; |
| uint16_t SourceFileCount; |
| char Padding[2]; |
| uint32_t Unused2; |
| uint32_t SourceFileNameIndex; |
| uint32_t PdbFilePathNameIndex; |
| char ModuleName[]; |
| char ObjFileName[]; |
| }; |
| |
| - **SectionContr** - Describes the properties of the section in the final binary |
| which contain the code and data from this module. |
| |
| - **Flags** - A bitfield with the following format: |
| |
| .. code-block:: c++ |
| |
| uint16_t Dirty : 1; // ``true`` if this ModInfo has been written since reading the PDB. |
| uint16_t EC : 1; // ``true`` if EC information is present for this module. It is unknown what EC actually is. |
| uint16_t Unused : 6; |
| uint16_t TSM : 8; // Type Server Index for this module. It is unknown what this is used for, but it is not used by LLVM. |
| |
| |
| - **ModuleSymStream** - The index of the stream that contains symbol information |
| for this module. This includes CodeView symbol information as well as source |
| and line information. |
| |
| - **SymByteSize** - The number of bytes of data from the stream identified by |
| ``ModuleSymStream`` that represent CodeView symbol records. |
| |
| - **C11ByteSize** - The number of bytes of data from the stream identified by |
| ``ModuleSymStream`` that represent C11-style CodeView line information. |
| |
| - **C13ByteSize** - The number of bytes of data from the stream identified by |
| ``ModuleSymStream`` that represent C13-style CodeView line information. At |
| most one of ``C11ByteSize`` and ``C13ByteSize`` will be non-zero. |
| |
| - **SourceFileCount** - The number of source files that contributed to this |
| module during compilation. |
| |
| - **SourceFileNameIndex** - The offset in the names buffer of the primary |
| translation unit used to build this module. All PDB files observed to date |
| always have this value equal to 0. |
| |
| - **PdbFilePathNameIndex** - The offset in the names buffer of the PDB file |
| containing this module's symbol information. This has only been observed |
| to be non-zero for the special ``* Linker *`` module. |
| |
| - **ModuleName** - The module name. This is usually either a full path to an |
| object file (either directly passed to ``link.exe`` or from an archive) or |
| a string of the form ``Import:<dll name>``. |
| |
| - **ObjFileName** - The object file name. In the case of an module that is |
| linked directly passed to ``link.exe``, this is the same as **ModuleName**. |
| In the case of a module that comes from an archive, this is usually the full |
| path to the archive. |
| |
| .. _dbi_sec_contr_substream: |
| |
| Section Contribution Substream |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| Begins at offset ``0`` immediately after the :ref:`dbi_mod_info_substream` ends, |
| and consumes ``Header->SectionContributionSize`` bytes. This substream begins |
| with a single ``uint32_t`` which will be one of the following values: |
| |
| .. code-block:: c++ |
| |
| enum class SectionContrSubstreamVersion : uint32_t { |
| Ver60 = 0xeffe0000 + 19970605, |
| V2 = 0xeffe0000 + 20140516 |
| }; |
| |
| ``Ver60`` is the only value which has been observed in a PDB so far. Following |
| this ``4`` byte field is an array of fixed-length structures. If the version |
| is ``Ver60``, it is an array of ``SectionContribEntry`` structures. If the |
| version is ``V2``, it is an array of ``SectionContribEntry2`` structures, |
| defined as follows: |
| |
| .. code-block:: c++ |
| |
| struct SectionContribEntry2 { |
| SectionContribEntry SC; |
| uint32_t ISectCoff; |
| }; |
| |
| The purpose of the second field is not well understood. |
| |
| |
| .. _dbi_section_map_substream: |
| |
| Section Map Substream |
| ^^^^^^^^^^^^^^^^^^^^^ |
| Begins at offset ``0`` immediately after the :ref:`dbi_sec_contr_substream` ends, |
| and consumes ``Header->SectionMapSize`` bytes. This substream begins with an ``8`` |
| byte header followed by an array of fixed-length records. The header and records |
| have the following layout: |
| |
| .. code-block:: c++ |
| |
| struct SectionMapHeader { |
| uint16_t Count; // Number of segment descriptors |
| uint16_t LogCount; // Number of logical segment descriptors |
| }; |
| |
| struct SectionMapEntry { |
| uint16_t Flags; // See the SectionMapEntryFlags enum below. |
| uint16_t Ovl; // Logical overlay number |
| uint16_t Group; // Group index into descriptor array. |
| uint16_t Frame; |
| uint16_t SectionName; // Byte index of segment / group name in string table, or 0xFFFF. |
| uint16_t ClassName; // Byte index of class in string table, or 0xFFFF. |
| uint32_t Offset; // Byte offset of the logical segment within physical segment. If group is set in flags, this is the offset of the group. |
| uint32_t SectionLength; // Byte count of the segment or group. |
| }; |
| |
| enum class SectionMapEntryFlags : uint16_t { |
| Read = 1 << 0, // Segment is readable. |
| Write = 1 << 1, // Segment is writable. |
| Execute = 1 << 2, // Segment is executable. |
| AddressIs32Bit = 1 << 3, // Descriptor describes a 32-bit linear address. |
| IsSelector = 1 << 8, // Frame represents a selector. |
| IsAbsoluteAddress = 1 << 9, // Frame represents an absolute address. |
| IsGroup = 1 << 10 // If set, descriptor represents a group. |
| }; |
| |
| Many of these fields are not well understood, so will not be discussed further. |
| |
| .. _dbi_file_info_substream: |
| |
| File Info Substream |
| ^^^^^^^^^^^^^^^^^^^ |
| Begins at offset ``0`` immediately after the :ref:`dbi_section_map_substream` ends, |
| and consumes ``Header->SourceInfoSize`` bytes. This substream defines the mapping |
| from module to the source files that contribute to that module. Since multiple |
| modules can use the same source file (for example, a header file), this substream |
| uses a string table to store each unique file name only once, and then have each |
| module use offsets into the string table rather than embedding the string's value |
| directly. The format of this substream is as follows: |
| |
| .. code-block:: c++ |
| |
| struct FileInfoSubstream { |
| uint16_t NumModules; |
| uint16_t NumSourceFiles; |
| |
| uint16_t ModIndices[NumModules]; |
| uint16_t ModFileCounts[NumModules]; |
| uint32_t FileNameOffsets[NumSourceFiles]; |
| char NamesBuffer[][NumSourceFiles]; |
| }; |
| |
| **NumModules** - The number of modules for which source file information is |
| contained within this substream. Should match the corresponding value from the |
| ref:`dbi_header`. |
| |
| **NumSourceFiles**: In theory this is supposed to contain the number of source |
| files for which this substream contains information. But that would present a |
| problem in that the width of this field being ``16``-bits would prevent one from |
| having more than 64K source files in a program. In early versions of the file |
| format, this seems to have been the case. In order to support more than this, this |
| field of the is simply ignored, and computed dynamically by summing up the values of |
| the ``ModFileCounts`` array (discussed below). In short, this value should be |
| ignored. |
| |
| **ModIndices** - This array is present, but does not appear to be useful. |
| |
| **ModFileCountArray** - An array of ``NumModules`` integers, each one containing |
| the number of source files which contribute to the module at the specified index. |
| While each individual module is limited to 64K contributing source files, the |
| union of all modules' source files may be greater than 64K. The real number of |
| source files is thus computed by summing this array. Note that summing this array |
| does not give the number of `unique` source files, only the total number of source |
| file contributions to modules. |
| |
| **FileNameOffsets** - An array of **NumSourceFiles** integers (where **NumSourceFiles** |
| here refers to the 32-bit value obtained from summing **ModFileCountArray**), where |
| each integer is an offset into **NamesBuffer** pointing to a null terminated string. |
| |
| **NamesBuffer** - An array of null terminated strings containing the actual source |
| file names. |
| |
| .. _dbi_type_server_substream: |
| |
| Type Server Substream |
| ^^^^^^^^^^^^^^^^^^^^^ |
| Begins at offset ``0`` immediately after the :ref:`dbi_file_info_substream` ends, |
| and consumes ``Header->TypeServerSize`` bytes. Neither the purpose nor the layout |
| of this substream is understood, although it is assumed to related somehow to the |
| usage of ``/Zi`` and ``mspdbsrv.exe``. This substream will not be discussed further. |
| |
| .. _dbi_ec_substream: |
| |
| EC Substream |
| ^^^^^^^^^^^^ |
| Begins at offset ``0`` immediately after the :ref:`dbi_type_server_substream` ends, |
| and consumes ``Header->ECSubstreamSize`` bytes. Neither the purpose nor the layout |
| of this substream is understood, and it will not be discussed further. |
| |
| .. _dbi_optional_dbg_stream: |
| |
| Optional Debug Header Stream |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| Begins at offset ``0`` immediately after the :ref:`dbi_ec_substream` ends, and |
| consumes ``Header->OptionalDbgHeaderSize`` bytes. This field is an array of |
| stream indices (e.g. ``uint16_t``'s), each of which identifies a stream |
| index in the larger MSF file which contains some additional debug information. |
| Each position of this array has a special meaning, allowing one to determine |
| what kind of debug information is at the referenced stream. ``11`` indices |
| are currently understood, although it's possible there may be more. The |
| layout of each stream generally corresponds exactly to a particular type |
| of debug data directory from the PE/COFF file. The format of these fields |
| can be found in the `Microsoft PE/COFF Specification <https://www.microsoft.com/en-us/download/details.aspx?id=19509>`__. |
| |
| **FPO Data** - ``DbgStreamArray[0]``. The data in the referenced stream is a |
| debug data directory of type ``IMAGE_DEBUG_TYPE_FPO`` |
| |
| **Exception Data** - ``DbgStreamArray[1]``. The data in the referenced stream |
| is a debug data directory of type ``IMAGE_DEBUG_TYPE_EXCEPTION``. |
| |
| **Fixup Data** - ``DbgStreamArray[2]``. The data in the referenced stream is a |
| debug data directory of type ``IMAGE_DEBUG_TYPE_FIXUP``. |
| |
| **Omap To Src Data** - ``DbgStreamArray[3]``. The data in the referenced stream |
| is a debug data directory of type ``IMAGE_DEBUG_TYPE_OMAP_TO_SRC``. This |
| is used for mapping addresses between instrumented and uninstrumented code. |
| |
| **Omap From Src Data** - ``DbgStreamArray[4]``. The data in the referenced stream |
| is a debug data directory of type ``IMAGE_DEBUG_TYPE_OMAP_FROM_SRC``. This |
| is used for mapping addresses between instrumented and uninstrumented code. |
| |
| **Section Header Data** - ``DbgStreamArray[5]``. A dump of all section headers from |
| the original executable. |
| |
| **Token / RID Map** - ``DbgStreamArray[6]``. The layout of this stream is not |
| understood, but it is assumed to be a mapping from ``CLR Token`` to |
| ``CLR Record ID``. Refer to `ECMA 335 <http://www.ecma-international.org/publications/standards/Ecma-335.htm>`__ |
| for more information. |
| |
| **Xdata** - ``DbgStreamArray[7]``. A copy of the ``.xdata`` section from the |
| executable. |
| |
| **Pdata** - ``DbgStreamArray[8]``. This is assumed to be a copy of the ``.pdata`` |
| section from the executable, but that would make it identical to |
| ``DbgStreamArray[1]``. The difference between these two indices is not well |
| understood. |
| |
| **New FPO Data** - ``DbgStreamArray[9]``. The data in the referenced stream is a |
| debug data directory of type ``IMAGE_DEBUG_TYPE_FPO``. It is not clear how this |
| differs from ``DbgStreamArray[0]``, but in practice all observed PDB files have |
| used the "new" format rather than the "old" format. |
| |
| **Original Section Header Data** - ``DbgStreamArray[10]``. Assumed to be similar |
| to ``DbgStreamArray[5]``, but has not been observed in practice. |