===================================== | |
The MSF File Format | |
===================================== | |
.. contents:: | |
:local: | |
.. _msf_layout: | |
File Layout | |
=========== | |
The MSF file format consists of the following components: | |
1. :ref:`msf_superblock` | |
2. :ref:`msf_freeblockmap` (also know as Free Page Map, or FPM) | |
3. Data | |
Each component is stored as an indexed block, the length of which is specified | |
in ``SuperBlock::BlockSize``. The file consists of 1 or more iterations of the | |
following pattern (sometimes referred to as an "interval"): | |
1. 1 block of data | |
2. Free Block Map 1 (corresponds to ``SuperBlock::FreeBlockMapBlock`` 1) | |
3. Free Block Map 2 (corresponds to ``SuperBlock::FreeBlockMapBlock`` 2) | |
4. ``SuperBlock::BlockSize - 3`` blocks of data | |
In the first interval, the first data block is used to store | |
:ref:`msf_superblock`. | |
The following diagram demonstrates the general layout of the file (\| denotes | |
the end of an interval, and is for visualization purposes only): | |
+-------------+-----------------------+------------------+------------------+----------+----+------+------+------+-------------+----+-----+ | |
| Block Index | 0 | 1 | 2 | 3 - 4095 | \| | 4096 | 4097 | 4098 | 4099 - 8191 | \| | ... | | |
+=============+=======================+==================+==================+==========+====+======+======+======+=============+====+=====+ | |
| Meaning | :ref:`msf_superblock` | Free Block Map 1 | Free Block Map 2 | Data | \| | Data | FPM1 | FPM2 | Data | \| | ... | | |
+-------------+-----------------------+------------------+------------------+----------+----+------+------+------+-------------+----+-----+ | |
The file may end after any block, including immediately after a FPM1. | |
.. note:: | |
LLVM only supports 4096 byte blocks (sometimes referred to as the "BigMsf" | |
variant), so the rest of this document will assume a block size of 4096. | |
.. _msf_superblock: | |
The Superblock | |
============== | |
At file offset 0 in an MSF file is the MSF *SuperBlock*, which is laid out as | |
follows: | |
.. code-block:: c++ | |
struct SuperBlock { | |
char FileMagic[sizeof(Magic)]; | |
ulittle32_t BlockSize; | |
ulittle32_t FreeBlockMapBlock; | |
ulittle32_t NumBlocks; | |
ulittle32_t NumDirectoryBytes; | |
ulittle32_t Unknown; | |
ulittle32_t BlockMapAddr; | |
}; | |
- **FileMagic** - Must be equal to ``"Microsoft C / C++ MSF 7.00\\r\\n"`` | |
followed by the bytes ``1A 44 53 00 00 00``. | |
- **BlockSize** - The block size of the internal file system. Valid values are | |
512, 1024, 2048, and 4096 bytes. Certain aspects of the MSF file layout vary | |
depending on the block sizes. For the purposes of LLVM, we handle only block | |
sizes of 4KiB, and all further discussion assumes a block size of 4KiB. | |
- **FreeBlockMapBlock** - The index of a block within the file, at which begins | |
a bitfield representing the set of all blocks within the file which are "free" | |
(i.e. the data within that block is not used). See :ref:`msf_freeblockmap` for | |
more information. | |
**Important**: ``FreeBlockMapBlock`` can only be ``1`` or ``2``! | |
- **NumBlocks** - The total number of blocks in the file. ``NumBlocks * BlockSize`` | |
should equal the size of the file on disk. | |
- **NumDirectoryBytes** - The size of the stream directory, in bytes. The stream | |
directory contains information about each stream's size and the set of blocks | |
that it occupies. It will be described in more detail later. | |
- **BlockMapAddr** - The index of a block within the MSF file. At this block is | |
an array of ``ulittle32_t``'s listing the blocks that the stream directory | |
resides on. For large MSF files, the stream directory (which describes the | |
block layout of each stream) may not fit entirely on a single block. As a | |
result, this extra layer of indirection is introduced, whereby this block | |
contains the list of blocks that the stream directory occupies, and the stream | |
directory itself can be stitched together accordingly. The number of | |
``ulittle32_t``'s in this array is given by ``ceil(NumDirectoryBytes / BlockSize)``. | |
.. _msf_freeblockmap: | |
The Free Block Map | |
================== | |
The Free Block Map (sometimes referred to as the Free Page Map, or FPM) is a | |
series of blocks which contains a bit flag for every block in the file. The | |
flag will be set to 0 if the block is in use, and 1 if the block is unused. | |
Each file contains two FPMs, one of which is active at any given time. This | |
feature is designed to support incremental and atomic updates of the underlying | |
MSF file. While writing to an MSF file, if the active FPM is FPM1, you can | |
write your new modified bitfield to FPM2, and vice versa. Only when you commit | |
the file to disk do you need to swap the value in the SuperBlock to point to | |
the new ``FreeBlockMapBlock``. | |
The Free Block Maps are stored as a series of single blocks thoughout the file | |
at intervals of BlockSize. Because each FPM block is of size ``BlockSize`` | |
bytes, it contains 8 times as many bits as an interval has blocks. This means | |
that the first block of each FPM refers to the first 8 intervals of the file | |
(the first 32768 blocks), the second block of each FPM refers to the next 8 | |
blocks, and so on. This results in far more FPM blocks being present than are | |
required, but in order to maintain backwards compatibility the format must stay | |
this way. | |
The Stream Directory | |
==================== | |
The Stream Directory is the root of all access to the other streams in an MSF | |
file. Beginning at byte 0 of the stream directory is the following structure: | |
.. code-block:: c++ | |
struct StreamDirectory { | |
ulittle32_t NumStreams; | |
ulittle32_t StreamSizes[NumStreams]; | |
ulittle32_t StreamBlocks[NumStreams][]; | |
}; | |
And this structure occupies exactly ``SuperBlock->NumDirectoryBytes`` bytes. | |
Note that each of the last two arrays is of variable length, and in particular | |
that the second array is jagged. | |
**Example:** Suppose a hypothetical PDB file with a 4KiB block size, and 4 | |
streams of lengths {1000 bytes, 8000 bytes, 16000 bytes, 9000 bytes}. | |
Stream 0: ceil(1000 / 4096) = 1 block | |
Stream 1: ceil(8000 / 4096) = 2 blocks | |
Stream 2: ceil(16000 / 4096) = 4 blocks | |
Stream 3: ceil(9000 / 4096) = 3 blocks | |
In total, 10 blocks are used. Let's see what the stream directory might look | |
like: | |
.. code-block:: c++ | |
struct StreamDirectory { | |
ulittle32_t NumStreams = 4; | |
ulittle32_t StreamSizes[] = {1000, 8000, 16000, 9000}; | |
ulittle32_t StreamBlocks[][] = { | |
{4}, | |
{5, 6}, | |
{11, 9, 7, 8}, | |
{10, 15, 12} | |
}; | |
}; | |
In total, this occupies ``15 * 4 = 60`` bytes, so ``SuperBlock->NumDirectoryBytes`` | |
would equal ``60``, and ``SuperBlock->BlockMapAddr`` would be an array of one | |
``ulittle32_t``, since ``60 <= SuperBlock->BlockSize``. | |
Note also that the streams are discontiguous, and that part of stream 3 is in the | |
middle of part of stream 2. You cannot assume anything about the layout of the | |
blocks! | |
Alignment and Block Boundaries | |
============================== | |
As may be clear by now, it is possible for a single field (whether it be a high | |
level record, a long string field, or even a single ``uint16``) to begin and | |
end in separate blocks. For example, if the block size is 4096 bytes, and a | |
``uint16`` field begins at the last byte of the current block, then it would | |
need to end on the first byte of the next block. Since blocks are not | |
necessarily contiguously laid out in the file, this means that both the consumer | |
and the producer of an MSF file must be prepared to split data apart | |
accordingly. In the aforementioned example, the high byte of the ``uint16`` | |
would be written to the last byte of block N, and the low byte would be written | |
to the first byte of block N+1, which could be tens of thousands of bytes later | |
(or even earlier!) in the file, depending on what the stream directory says. |