Академический Документы
Профессиональный Документы
Культура Документы
Page 1 of 27
Table of Contents:
Document Introduction .................................................................................................................... 3
Section 1: Information Pertaining To This Document...................................................................... 4
Section 1.1: Intended Audience....................................................................................................... 4
Section 1.2: Conventions in This Document ................................................................................... 4
Section 1.3: The Evolution of NTFS ................................................................................................ 5
Section 1.4: Features ...................................................................................................................... 5
Section 1.5: File Systems and Binary Trees ................................................................................... 6
Section 2: Accessing Data Through Raw Reads ............................................................................ 7
Section 2.1: Beginning The Process Of Extraction ......................................................................... 7
Figure 2.1 Opening The Physical Drive With Read Permissions .................................................... 7
Figure 2.2 Partition Structure........................................................................................................... 8
Section 2.2: Identifying a Partition Type......................................................................................... 8
Figure 2.3 Example Different Partition Types.................................................................................. 9
Section 2.3: The NTFS Boot Sector ................................................................................................ 9
Figure 2.4 NTFS Boot Sector Layout ............................................................................................ 10
Section 2.4: Master File Table (MFT) ............................................................................................ 11
Figure 2.5 MFT File Entry Header ................................................................................................. 12
Section 2.5: Metafiles .................................................................................................................... 13
Figure 2.6 MFT Special Files......................................................................................................... 13
Section 2.6: MFT Record Attributes .............................................................................................. 14
Figure 2.7 MFT Record Attribute Type Table................................................................................ 14
Figure 2.8 MFT Record Attribute Header ...................................................................................... 15
Section 2.7: Extracting Resident Attribute Data ............................................................................ 17
Figure 2.9 Resident Attribute Structure ......................................................................................... 18
Section 2.8: Extracting Non-Resident Attribute Data .................................................................... 18
Figure 2.10 Nonresident Attribute Structure.................................................................................. 18
Figure 2.11 A Theoretical Length Offset Union In C ..................................................................... 20
Section 2.9: Directories and the NTFS B+ Tree ............................................................................ 22
Figure 2.12 INDX Record Standard Header.................................................................................. 23
Figure 2.13 Individual INDX Record Entry Structure..................................................................... 24
Section 3: Simplified Theory of Data Extraction ............................................................................ 26
Appendix 1. Citations and References ......................................................................................... 27
Page 2 of 27
Document Introduction
It should be known that this paper does not intend to describe all
functional parts of the NTFS file system. For details on portions of the filesystem
which are not discussed with in this paper itself a developer would be best suited
to read the "ntfsdoc.pdf" file listed in the citations and references section of this
document. The “ntfsdoc.pdf” document was generated by the developers
responsible for the Linux NTFS drivers, and is not fully completed at the time of
this document however it still maintains a collection of information more useful to
a developer utilizing NTFS, than any other documentation freely available.
Additionally on the same topic, for additional algorithmic information detailing
how NTFS operations are performed, reading the NTFS Linux driver source code
provides an excellent reference for advanced developers. The author would
almost go as far as to consider reading relevant portions of the Linux drivers as a
prerequisite to this paper, if only to make the paper more understandable.
Page 3 of 27
Section 1: Information Pertaining To This Document
This is not to say that this paper will be a simple experience for a novice
programmer. The full extraction, from start to finish, of NTFS data takes a
significant number of calculations to simply reach the data to be extracted, let
alone the several more required to extract nonresident data from the physical
device. However, the reward of accomplishing such a task affords a developer
the ability to access data which is typically locked even by the kernel (e.g. the
page file). Several forensics toolkits utilize this technology for that exact
purpose, this paper attempts to document the process in full.
All code examples within this document are written utilizing the C
programming language. Before continuing any further it should be said that
without knowledge of the C programming language it will be very easy to get lost
within the concepts described by this document.
Page 4 of 27
can enable this functionality utilizing structure attributes. For a good example of
how to do this in Linux examine the source code (specifically the headers) found
in the NTFS -- 3 G. driver package.
Over the years there have been several iterations of the NTFS file system,
each of which bringing new features to end-users and application designers. The
first iteration version 1.0 was released in 1993 and was utilized by NT 3.1 as its
primary filesystem. Version 1.1 was released in 1994 with the release of NT 3.5.
The third iteration version 1.2 was an update that was available for NT 3.51 and
NT 4.0 mid-1995 and mid-1996. The newer Windows versions from Windows
2000 and above are utilizing either version 3.0 or version 3.1 as their core file
systems. Version 3.1 as of the writing of this paper is the current version of
NTFS.
NTFS has several features that make it a capable and robust journaling
filesystem. Features such as alternate data streams, quotas, sparse files, re-
parse points, volume mount points, directory junctions, hard links, hierarchical
storage management, file encryption and compression, and very high
performance make NTFS a well-suited choice for various intensive filesystem
tasks.
Page 5 of 27
Section 1.5: File Systems and Binary Trees
Page 6 of 27
Section 2: Accessing Data Through Raw Reads
Opening a disk for raw reading and writing is as simple as opening the
device \\.\PhysicalDriveN where N represents the number of the drive you want
to open for raw reading and writing. Once you open the device, navigating it
becomes the next obstacle. A physical disc can hold multiple different types of
partitions. Even though you're only interested as far as this paper is concerned
in the content of NTFS partitions some care should be taken to detail the steps it
takes to identify other partition types on a physical disk so that we can identify
which partition is truly NTFS. Figure 2.1 below shows how to use the create file
API routine in order to access the drive for physical reading.
Note: Before we continue it is important to know that when reading from an NTFS
drive, if you do not read in increments of 512 bytes (or the disk sector size) the
drive reads will fail.
The first sector on the primary disk is the master boot record. This record
is 512 bytes long and contains an area for boot code, an optional disk signature,
and a table of primary partitions with a maximum entry count of 4. This structure
is not defined in this document however the Wikipedia entry on the topic has a
full synopsis of all members of the MBR structure.
Page 7 of 27
To access partition information set the file pointer of the open drive to the
hexadecimal offset 0x1BE (offset to partition table in MBR). This moves us to the
beginning of our partition table on the disc. The structure which represents what
our partition table looks like is shown in the structure 2.2 below.
Page 8 of 27
Figure 2.3 Example Different Partition Types
Value Type
0x00 Empty
0x07 NTFS
0x83 Linux Native
0x82 Linux Swap or Solaris X86
Note: A full list of available partition type can be found at the link below.
http://www.win.tue.nl/~aeb/partitions/partition_types-1.html
Page 9 of 27
Figure 2.4 NTFS Boot Sector Layout
Page 10 of 27
Section 2.4: Master File Table (MFT)
Perhaps the most important element found inside the BPB is the cluster
number of the Master file table. NTFS uses this Master file table as a directory of
sorts, with its internal binary tree referencing the MFT similar to looking up phone
numbers in a phone book if you knew a correct page number in the phone book
for the person you were attempting to find. In reality a Master file table is simply
a series of clusters on the physical disk which holds file entries that can be
looked up by a "phone number" of sorts. This theoretical phone number, in
practice, is known as the MFT record number. Each record number corresponds
to one file entry, and each of these entries represents one file in the file system.
In the body of each MFT file entry, information about the file is stored in a series
of file attributes, stored in variably sized and labeled attribute structures.
To find the start of the MFT, first find the number of bytes per cluster by
multiplying the number of sectors per cluster times the number of bytes per
sector. Go ahead and put this number on the side we will be using it later. Next
will be using large integer arithmetic in order to determine where our proper
offset to the MFT is. Take the number of bytes per sector and multiply that first
by the NT relative sector found inside the partition table, take that number and
put it aside with the other result.
To complete the calculation take the number of bytes per cluster, which
was calculated earlier, and multiply it by the logical cluster number of the MFT
Page 11 of 27
which can be found inside the BIOS parameter block. The result of this
calculation can be added to the start of the NT relative data offset which we have
already calculated earlier in order to create an offset that we can use easily with
the SetFilePointer() Microsoft API routine to advance the file pointer to the
correct offset.
Figure 2.5 below describes the Master file table entry structure format that
is used with NTFS. Each structure starts off with a four byte file signature “FILE”
in ASCII at the start of each record, which makes record identification easy.
Each record is 1024 bytes, or the typical size of two sectors on the system.
Page 12 of 27
Section 2.5: Metafiles
Certain files in the MFT file table have been designated by the operating
system as metafiles which are of special-purpose. For example consider that the
MFT itself has its own MFT entry. Various metafiles contain important
information about the system. Figure 1.6 below shows a full listing of MFT
metafiles. This paper primarily focuses on the MFT metafiles, specifically the
root directory metafile and the MFT metafile, for the forensic process of recovery
of raw file system data.
5 . Root directory.
Page 13 of 27
A file which contains all the clusters marked as having
bad sectors. This file simplifies cluster management
8 $BadClus by the chkdsk utility, both as a place to put newly
discovered bad sectors, and for identifying
unreferenced clusters.
In order for the operating system to determine one MFT entry from
another it uses what are known as attribute headers. The table below shows
each different hexadecimal value which correspond to a record attribute type.
Page 14 of 27
First User Defined Attribute 0x1000
End of Attributes (records) 0xffffffff
To find the entry type of an attribute header, read in one (typically 1024
byte) MFT record from the start of file entry in the MFT into an MFT file structure
as shown in figure 2.5. Next use the attribute offset member of that structure as
an offset into the MFT record to the appropriate position for reading attributes.
Attributes are to be read in one at a time as they are variable in length, content,
purpose, and data type.
DWORD dwType;
DWORD dwFullLength;
BYTE uchNonResFlag;
BYTE uchNameLength;
WORD wNameOffset;
WORD wFlags;
WORD wID;
union ATTR {
struct RESIDENT {
DWORD dwLength;
WORD wAttrOffset;
BYTE uchIndexedTag;
BYTE uchPadding;
} Resident;
struct NONRESIDENT {
LONGLONG n64StartVCN;
LONGLONG n64EndVCN;
WORD wDatarunOffset;
WORD wCompressionSize;
BYTE uchPadding[4];
LONGLONG n64AllocSize;
LONGLONG n64RealSize;
LONGLONG n64StreamSize;
} NonResident;
} Attr;
} _NTFS_ATTRIBUTE, *P_NTFS_ATTRIBUTE;
Page 15 of 27
After reading in an attribute, during extraction of its relevant data, it is first
important to understand that the attribute header and any data header are
completely separate. The attribute header is there to help the programmer find
the relevant data for an attribute, not to be the relevant data header itself.
It is also important to be aware that the MFT record attribute data is stored
in two ways, both as resident and nonresident data. Resident data is literally
resident within the MFT record itself. If this is the case, which it typically is for
smaller files under 1024 bytes, the data for this file fits within the actual MFT
record entry in the MFT table.
Page 16 of 27
Section 2.7: Extracting Resident Attribute Data
Before we continue make sure you have handy the number of bytes per
cluster. If you haven't calculated this value previously, examine your NTFS boot
sector structure and multiply the two values found in the BIOS parameter block
that reflect the sectors per cluster and the bytes per sector. The result of this
multiplication is the number of bytes per cluster which we will be using later in
this document.
To read actual data which is marked resident in a MFT entry, start from
the beginning of your MFT record entry and move the byte index of the record
itself, to the attribute headers "resident" union structure. Next extract an offset
from that structure, for a correct offset appropriate for reading in an attribute
header. Now that you have the correct offset, set the correct offset position
within a MFT record buffer so that data can begin to be copied.
The number of bytes to read from this position is found in the union
resident structure itself. Figure 2.9 below describes the resident attribute
structure, and shows that the first member is a length field. Reading from the
correct position, at a maximum distance of that length value, resident data can be
extracted into a buffer for use by a developer.
Page 17 of 27
Figure 2.9 Resident Attribute Structure
struct RESIDENT {
DWORD dwLength;
WORD wAttrOffset;
BYTE uchIndexedTag;
BYTE uchPadding;
} Resident;
The extraction of nonresident data is a bit more difficult than resident data
to extract due to the additional calculations required to simply find the location of
the clusters holding the data. This is because of the fact that instead of being
stored resident in the header, the data is stored in one or multiple separate "runs"
(clusters) on the disk.
LONGLONG n64StartVCN;
LONGLONG n64EndVCN;
WORD wDatarunOffset;
WORD wCompressionSize; // compression unit size
BYTE uchPadding[4];
LONGLONG n64AllocSize;
LONGLONG n64RealSize;
LONGLONG n64StreamSize;
} NonResident;
Page 18 of 27
Understand that data stored nonresident from its MFT entry can, and often
will achieve sizes much greater then there is physical memory available to the
application. This makes reading data from physical runs into allocated buffers an
iterative process which usually ends in a sort of bucket brigade with the final
bucket typically being another out file. Due to this, when reading data which is
nonresident, it is very important to check the size. The real size of a file with all
of its runs combined is stored in a large integer value (eight bytes) as a member
inside of the nonresident structure (e.g. n64RealSize).
Begin extraction by creating a one byte value which will represent our
length offset size, the first byte from our read position should now be copied into
this variable. Using the variable you created to hold the nonresident structures,
data run offset earlier increment it by one byte, as this will be our next read
position.
Using the byte we just saved, we will need to split it into 4 bit pieces. The
one byte itself represents a bit field where the top four bits represent a length,
and the last four bits represent an offset. Calculating these two values can be
done by creating a split union typedef for a one byte value which can then be
used as a typecast when representing the length/offset bit-field. Figure 2.11
below shows how a union structure would look if created in this fashion in the C
programming language.
Page 19 of 27
Figure 2.11 A Theoretical Length Offset Union In C
Development Note: when using the offset member of the bit-field you have to bit
shift four bits to the right in order to get the correct offset value unless using the
union in figure 2.11.
To continue, create an eight byte value which will hold a copied record
length for our first data run. The four bit length field in our bit field union contains
the length in bytes from which to copy the full length of the data run. This length
will never exceed eight bytes in length, but can often be much shorter than eight
bytes. This dynamic sizing makes knowing the value in the bit field extremely
important, so that a programmer will not copy in bytes which were not intended
and skew results. Copy in the data from the current record position at the
specific length, and advance the current read position by that same length to
move to the next read offset.
From the newly set read position another read with a possible length of
eight bytes will be read; create another large integer value for the purpose of
storing this data. Using the offset member in the bit field shown in figure 2.11 as
a length parameter, copy in the data from the current read position into our new
variable using memcpy or a similar routine. Now that you've read in your offset
length, advance your read position by the offset member in the bit field to position
the correct placement in the read buffer.
Page 20 of 27
To maintain usability during the extraction of data runs (clusters), a new
large integer value should be created. Since this is the first run being extracted,
add the value of the extracted eight byte offset to the newly created large integer
value. Maintaining this offset will allow us to read multiple different data runs
much easier in the long run.
Once you have found the correct NT relative sector offset in bytes, add the
calculated logical cluster number for the data run to this offset to find our correct
read position. The number of bytes which we will be read, should be read in
increments of the systems cluster size. You should have calculated this value
earlier in this document and put it aside.
Begin reading clusters one by one until you read the correct number of
bytes which is defined by our previous length calculation. Once that is finished,
you have completed the extraction of the first data run. During this process you
can either read the data directly into a file, or into a memory buffer.
Authors Note: Preferably, due to the potential of large files when reading from a
data attribute, it's usually best to read this attribute type directly into a file.
In order to extract multiple data runs, the same steps can be repeated
from the next MFT records read position. Typically for loops, or while loops
which keep track of our attribute position within the MFT buffer, are best suited
for this purpose.
Page 21 of 27
Section 2.9: Directories and the NTFS B+ Tree
The actual contents of a directory entry in regard to files, are stored within
a separate attribute header within the same MFT record. The index allocation
attribute of an MFT record starts with a simple header then, stores contiguously
in 4096 byte allocations, the full directory contents as a sequence of INDX
records. Figure 2.12 shown below contains the structure of the header found at
the beginning of each 4096 byte INDX record.
Page 22 of 27
Figure 2.12 INDX Record Standard Header
char magicNumber[4];
LONGLONG logFileSeqNum;
LONGLONG vcnOfINDX;
DWORD indexEntryOffs;
DWORD sizeOFEntries;
DWORD sizeOfEntryAlloc;
BYTE flags;
BYTE padding[3];
} NTATTR_STANDARD_INDEX_HEADER, *P_NTATTR_STANDARD_INDEX_HEADER;
Each full INDX record as was said previously is 4096 bytes on a typical
system, each starting with a header, followed by individual INDX record entries.
Figure 2.13 below shows the structure of an individual INDX record entry as it
would appear when encountered in the index allocation after the INDX header on
the raw disk within an INDX record.
Page 23 of 27
Figure 2.13 Individual INDX Record Entry Structure
The first member of each INDX record entry contains the MFT record
number, which corresponds to the actual MFT record in the MFT itself. In order to
translate this MFT record number to the actual record entry in the MFT, multiply
the value by the size of an entry (typically 1024 bytes) and use that as an offset
from the start of the MFT to find the record. The MFT when searching for
records through offsets, has to be the complete MFT extracted from all runs
(clusters) on the disc, unless the developer only wants to search through entries
stored in the first data run of the MFT (e.g. metafiles).
When searching through INDX records for a particular file, accessing the
file name is of some importance. The file name character string (in Unicode), of
the file for which each Index record entry belongs, can be found directly after the
end of the Index record entry structure. Parts of the Index record entry structure
itself contain information about the dynamic length of this Unicode string as well
as a locale for the file name type (e.g. POSIX, Win32).
Page 24 of 27
For navigating INDX records without having to go through the trouble of
the calculation of elements and sizes for members, the INDX record contains a
member element for the "size of the index entry". Retrieving the correct offset to
the next INDX record is as simple as adding your current offset for reading with
the "size of the index entry" member.
Finding the end of the index entries can be done by taking the previous
read offset and verifying that it is not larger than the INDX standard index header
"size of entries" member. For loops or while loops are traditionally the best
logical programming constructs for executing a search through a sequence of
INDX records.
That is all there truly is to it, a balanced tree in the case of NTFS can be
witnessed as an INDX record which is a directory. Since each directory can have
sub nodes, which are directories, the tree can branch effectively. This is the
principle behind the B+ binary tree as implemented by Microsoft for the NTFS file
system.
Page 25 of 27
Section 3: Simplified Theory of Data Extraction
The following seven steps are the general steps that need to be taken in
order to extract data from files on the raw file system. Each of these steps are
documented in reliable detail in the above section two. All steps documented
have been used practically to perform this task in production environments.
3. Find the root directory metafile entry in the MFT and extract its index
allocation attribute data.
4. Process the INDX records found within the index allocation attribute data
one by one recursively until you find a file that matches the one you are
looking for.
5. In the index entry that matched, find the MFT record number and move to
that record position within the MFT.
6. Record the MFT entry and process its standard attribute headers one by
one until the data attribute is encountered.
7. Use the process of attribute data extraction in order to retrieve the data
attribute, which contains the contents of the file that is being accessed.
Page 26 of 27
Appendix 1. Citations and References
Linux NTFS Driver Project. "NTFS documentation",
Richard Russon and Yuval Fledel 2005
http://data.linux-ntfs.org/ntfsdoc.pdf
Additional Research Note: for additional information on NTFS which was not
utilized in this document but that could still be potentially useful, see the "external
links section" at the bottom of the NTFS Wikipedia entry.
Page 27 of 27