Вы находитесь на странице: 1из 27

NTFS Forensics:

A Programmers View of Raw Filesystem Data


Extraction
Jason Medeiros
Grayscale Research 2008

Page 1 of 27
Table of Contents:
Document Introduction .................................................................................................................... 3
Section 1: Information Pertaining To This Document...................................................................... 4
Section 1.1: Intended Audience....................................................................................................... 4
Section 1.2: Conventions in This Document ................................................................................... 4
Section 1.3: The Evolution of NTFS ................................................................................................ 5
Section 1.4: Features ...................................................................................................................... 5
Section 1.5: File Systems and Binary Trees ................................................................................... 6
Section 2: Accessing Data Through Raw Reads ............................................................................ 7
Section 2.1: Beginning The Process Of Extraction ......................................................................... 7
Figure 2.1 Opening The Physical Drive With Read Permissions .................................................... 7
Figure 2.2 Partition Structure........................................................................................................... 8
Section 2.2: Identifying a Partition Type......................................................................................... 8
Figure 2.3 Example Different Partition Types.................................................................................. 9
Section 2.3: The NTFS Boot Sector ................................................................................................ 9
Figure 2.4 NTFS Boot Sector Layout ............................................................................................ 10
Section 2.4: Master File Table (MFT) ............................................................................................ 11
Figure 2.5 MFT File Entry Header ................................................................................................. 12
Section 2.5: Metafiles .................................................................................................................... 13
Figure 2.6 MFT Special Files......................................................................................................... 13
Section 2.6: MFT Record Attributes .............................................................................................. 14
Figure 2.7 MFT Record Attribute Type Table................................................................................ 14
Figure 2.8 MFT Record Attribute Header ...................................................................................... 15
Section 2.7: Extracting Resident Attribute Data ............................................................................ 17
Figure 2.9 Resident Attribute Structure ......................................................................................... 18
Section 2.8: Extracting Non-Resident Attribute Data .................................................................... 18
Figure 2.10 Nonresident Attribute Structure.................................................................................. 18
Figure 2.11 A Theoretical Length Offset Union In C ..................................................................... 20
Section 2.9: Directories and the NTFS B+ Tree ............................................................................ 22
Figure 2.12 INDX Record Standard Header.................................................................................. 23
Figure 2.13 Individual INDX Record Entry Structure..................................................................... 24
Section 3: Simplified Theory of Data Extraction ............................................................................ 26
Appendix 1. Citations and References ......................................................................................... 27

Page 2 of 27
Document Introduction

The purpose of this document is to attempt, in a straightforward technical


manner, to describe the process of extracting data raw from an NTFS file system
from a physical device. The platform of discussion for this particular paper
centers on Win32 platforms running on NTFS version 3.0 or above.

Some of the concepts discussed in this document are rather difficult to


understand without a proper background in programming. That being said,
section 3 of this document still contains a very simplified seven step logical
breakdown of the steps required for NTFS file system data extraction. For
understanding the theory without the practical programming knowledge, consult
section 3 for a high-level overview.

It should be known that this paper does not intend to describe all
functional parts of the NTFS file system. For details on portions of the filesystem
which are not discussed with in this paper itself a developer would be best suited
to read the "ntfsdoc.pdf" file listed in the citations and references section of this
document. The “ntfsdoc.pdf” document was generated by the developers
responsible for the Linux NTFS drivers, and is not fully completed at the time of
this document however it still maintains a collection of information more useful to
a developer utilizing NTFS, than any other documentation freely available.
Additionally on the same topic, for additional algorithmic information detailing
how NTFS operations are performed, reading the NTFS Linux driver source code
provides an excellent reference for advanced developers. The author would
almost go as far as to consider reading relevant portions of the Linux drivers as a
prerequisite to this paper, if only to make the paper more understandable.

Page 3 of 27
Section 1: Information Pertaining To This Document

Section 1.1: Intended Audience

The audience for this paper should have a solid background in


programming and critical thinking. So long as the programmer has an
understanding of how to read and write files on a filesystem using a programming
language with disk access, this paper should be understandable in the purpose
of building raw NTFS extraction engines.

This is not to say that this paper will be a simple experience for a novice
programmer. The full extraction, from start to finish, of NTFS data takes a
significant number of calculations to simply reach the data to be extracted, let
alone the several more required to extract nonresident data from the physical
device. However, the reward of accomplishing such a task affords a developer
the ability to access data which is typically locked even by the kernel (e.g. the
page file). Several forensics toolkits utilize this technology for that exact
purpose, this paper attempts to document the process in full.

Section 1.2: Conventions in This Document

All code examples within this document are written utilizing the C
programming language. Before continuing any further it should be said that
without knowledge of the C programming language it will be very easy to get lost
within the concepts described by this document.

Additionally all C structures in this document should be considered to be


packed to a one byte alignment using the #pragma pack keyword. GNU Linux

Page 4 of 27
can enable this functionality utilizing structure attributes. For a good example of
how to do this in Linux examine the source code (specifically the headers) found
in the NTFS -- 3 G. driver package.

Section 1.3: The Evolution of NTFS

Over the years there have been several iterations of the NTFS file system,
each of which bringing new features to end-users and application designers. The
first iteration version 1.0 was released in 1993 and was utilized by NT 3.1 as its
primary filesystem. Version 1.1 was released in 1994 with the release of NT 3.5.
The third iteration version 1.2 was an update that was available for NT 3.51 and
NT 4.0 mid-1995 and mid-1996. The newer Windows versions from Windows
2000 and above are utilizing either version 3.0 or version 3.1 as their core file
systems. Version 3.1 as of the writing of this paper is the current version of
NTFS.

Section 1.4: Features

NTFS has several features that make it a capable and robust journaling
filesystem. Features such as alternate data streams, quotas, sparse files, re-
parse points, volume mount points, directory junctions, hard links, hierarchical
storage management, file encryption and compression, and very high
performance make NTFS a well-suited choice for various intensive filesystem
tasks.

Page 5 of 27
Section 1.5: File Systems and Binary Trees

No matter which filesystem is being discussed, they all rely on the


efficiency of binary trees to make ease of complicated hierarchical storage
requirements. The reason for this is that binary trees afford the file system
developer the simplest and most efficient solution to the problem at hand.

Algorithmically NTFS utilizes a B+ tree in order to maintain a perfect


balance between of efficiency in both large file operations and small file
operations. As a primer before reading this document consider reading the
Wikipedia entry on B+ trees, as it will provide valuable insight into what
theoretical data structures you will be encountering within this paper. A link to
the entry at Wikipedia can be found in the citations and references section of this
paper.

Page 6 of 27
Section 2: Accessing Data Through Raw Reads

Section 2.1: Beginning The Process Of Extraction

Opening a disk for raw reading and writing is as simple as opening the
device \\.\PhysicalDriveN where N represents the number of the drive you want
to open for raw reading and writing. Once you open the device, navigating it
becomes the next obstacle. A physical disc can hold multiple different types of
partitions. Even though you're only interested as far as this paper is concerned
in the content of NTFS partitions some care should be taken to detail the steps it
takes to identify other partition types on a physical disk so that we can identify
which partition is truly NTFS. Figure 2.1 below shows how to use the create file
API routine in order to access the drive for physical reading.

Figure 2.1 Opening The Physical Drive With Read Permissions


HANDLE hd = CreateFile(“\\\\.\\PhysicalDrive0”, GENERIC_READ, FILE_SHARE_READ, 0,
OPEN_EXISTING, 0, NULL);

Note: Before we continue it is important to know that when reading from an NTFS
drive, if you do not read in increments of 512 bytes (or the disk sector size) the
drive reads will fail.

The first sector on the primary disk is the master boot record. This record
is 512 bytes long and contains an area for boot code, an optional disk signature,
and a table of primary partitions with a maximum entry count of 4. This structure
is not defined in this document however the Wikipedia entry on the topic has a
full synopsis of all members of the MBR structure.

Page 7 of 27
To access partition information set the file pointer of the open drive to the
hexadecimal offset 0x1BE (offset to partition table in MBR). This moves us to the
beginning of our partition table on the disc. The structure which represents what
our partition table looks like is shown in the structure 2.2 below.

Figure 2.2 Partition Structure


typedef struct _PARTITION {
BYTE chBootInd;
BYTE chHead;
BYTE chSector;
BYTE chCylinder;
BYTE chType; // <- partition type identifier
BYTE chLastHead;
BYTE chLastSector;
BYTE chLastCylinder;
DWORD dwRelativeSector;
DWORD dwNumberSectors;
}PARTITION, *P_PARTITION;

The partition table by itself is nothing more than a contiguous sequence of


partition structures one following the other. Multiple partition table entries can
exist on the same host disk, so being able to sort through them becomes
important especially in the case of file system forensics. Identifying partitions is
as simple as looking up the partition type member.

Section 2.2: Identifying a Partition Type


Looking at the structure of figure 2.2 you can see the parameter chType
which is used to identify a partition type. NTFS has a partition type number
designation of 0x07. If you match the chType variable from a partition table entry
with this number you’ve identified a NTFS partition.

Page 8 of 27
Figure 2.3 Example Different Partition Types

Value Type
0x00 Empty
0x07 NTFS
0x83 Linux Native
0x82 Linux Swap or Solaris X86

Note: A full list of available partition type can be found at the link below.
http://www.win.tue.nl/~aeb/partitions/partition_types-1.html

Section 2.3: The NTFS Boot Sector


As soon as you've identified an NTFS partition you can follow the “relative
sector” number offset to the NTFS boot sector itself. Multiply the sector number
by the size of a sector, and you have the correct offset from which to set the read
file pointer. Figure 2.4 below describes the aspects of our boot sector.

Page 9 of 27
Figure 2.4 NTFS Boot Sector Layout

typedef struct _NTFS_PART_BOOT_SECTOR {


char chJumpInstruction[3];
char chOemID[4];
char chDummy[4];
struct NTFS_BPB {
WORD wBytesPerSec;
BYTE uchSecPerClust;
WORD wReservedSec;
BYTE uchReserved[3];
WORD wUnused1;
BYTE uchMediaDescriptor;
WORD wUnused2;
WORD wSecPerTrack;
WORD wNumberOfHeads;
DWORD dwHiddenSec;
DWORD dwUnused3;
DWORD dwUnused4;
LONGLONG n64TotalSec;
LONGLONG n64MFTLogicalClustNum; // ( cluster for MFT
LONGLONG n64MFTMirrLogicalClustNum;
int nClustPerMFTRecord;
int nClustPerIndexRecord;
LONGLONG n64VolumeSerialNum;
DWORD dwChecksum;
} bpb;
char chBootstrapCode[426];
WORD wSecMark;
} NTFS_BOOT_SECTOR, *P_NTFS_BOOT_SECTOR;

The NTFS boot sector contains important internal pieces of information


which can be used for the intent of disk forensics. For one it contains the
bootstrap code which is used to start Windows as well as a jump instruction that
is used to start the operating system. Within the NTFS boot sector you can find
what is known as the BPB or bios parameter block. The bios parameter block
serves as a description of the physical medium being used to store the NTFS file
system itself. In Figure 2.4 above, the BPB is shown as a substructure inside the
NTFS boot sector, as that is how it is placed in the boot sector during drive
reading.

Page 10 of 27
Section 2.4: Master File Table (MFT)

Perhaps the most important element found inside the BPB is the cluster
number of the Master file table. NTFS uses this Master file table as a directory of
sorts, with its internal binary tree referencing the MFT similar to looking up phone
numbers in a phone book if you knew a correct page number in the phone book
for the person you were attempting to find. In reality a Master file table is simply
a series of clusters on the physical disk which holds file entries that can be
looked up by a "phone number" of sorts. This theoretical phone number, in
practice, is known as the MFT record number. Each record number corresponds
to one file entry, and each of these entries represents one file in the file system.
In the body of each MFT file entry, information about the file is stored in a series
of file attributes, stored in variably sized and labeled attribute structures.

To find the start of the MFT, first find the number of bytes per cluster by
multiplying the number of sectors per cluster times the number of bytes per
sector. Go ahead and put this number on the side we will be using it later. Next
will be using large integer arithmetic in order to determine where our proper
offset to the MFT is. Take the number of bytes per sector and multiply that first
by the NT relative sector found inside the partition table, take that number and
put it aside with the other result.

The next part of the calculation requires us to multiply the NT relative


sector, which can be found in the partition table, with the number of bytes per
sector which we can find in the BIOS parameter block in the NTFS boot sector.
This effectively grows our large integer to a number that borders the start of our
NT relative data. The MFT offset is relative to this position.

To complete the calculation take the number of bytes per cluster, which
was calculated earlier, and multiply it by the logical cluster number of the MFT

Page 11 of 27
which can be found inside the BIOS parameter block. The result of this
calculation can be added to the start of the NT relative data offset which we have
already calculated earlier in order to create an offset that we can use easily with
the SetFilePointer() Microsoft API routine to advance the file pointer to the
correct offset.

Figure 2.5 below describes the Master file table entry structure format that
is used with NTFS. Each structure starts off with a four byte file signature “FILE”
in ASCII at the start of each record, which makes record identification easy.
Each record is 1024 bytes, or the typical size of two sectors on the system.

Figure 2.5 MFT File Entry Header

typedef struct _NTFS_MFT_FILE_ENTRY_HEADER {


char fileSignature[4];
WORD wFixupOffset;
WORD wFixupSize;
LONGLONG n64LogSeqNumber;
WORD wSequence;
WORD wHardLinks;
WORD wAttribOffset;
WORD wFlags;
DWORD dwRecLength;
DWORD dwAllLength;
LONGLONG n64BaseMftRec;
WORD wNextAttrID;
WORD wFixupPattern;
DWORD dwMFTRecNumber;
} NTFS_MFT_FILE_ENTRY_HEADER, *P_NTFS_MFT_FILE_ENTRY_HEADER;

Page 12 of 27
Section 2.5: Metafiles

Certain files in the MFT file table have been designated by the operating
system as metafiles which are of special-purpose. For example consider that the
MFT itself has its own MFT entry. Various metafiles contain important
information about the system. Figure 1.6 below shows a full listing of MFT
metafiles. This paper primarily focuses on the MFT metafiles, specifically the
root directory metafile and the MFT metafile, for the forensic process of recovery
of raw file system data.

Figure 2.6 MFT Special Files

MFT record number File Name Special file purpose

The Master file table used to describe the positions,


names, timestamps, and all other pertinent pieces of
0 $MFT
information useful in the identification and lookup of
individual files.

A stored copy of the first four MFT record entries,


1 $MFTMirr which are used to restore a partition in the case of
repair.

Transaction log of the file system, used to record


2 $LogFile
changes for file system journaling.

Contains various data elements, such as the volume


3 $Volume object identifier (VOI), volume label, volume flags, and
file system version.

A table of NTFS specific attributes, data structure


4 $AttrDef
sizes, and other useful information.

5 . Root directory.

Bit field representing if a particular cluster on the


6 $Bitmap
volume is either used or free.

7 $Boot Volume boot record.

Page 13 of 27
A file which contains all the clusters marked as having
bad sectors. This file simplifies cluster management
8 $BadClus by the chkdsk utility, both as a place to put newly
discovered bad sectors, and for identifying
unreferenced clusters.

Access control list database which reduces overhead


9 $Secure having many identical ACLs stored with each file, by
uniquely storing these ACLs in this database only.

A table of unicode uppercased characters for ensuring


10 $UpCase
case insensitivity in Win32 and DOS namespaces.

A filesystem directory containing various optional


11 $Extend extensions, such as $Quota, $ObjId, $Reparse or
$UsnJrnl.

Section 2.6: MFT Record Attributes

In order for the operating system to determine one MFT entry from
another it uses what are known as attribute headers. The table below shows
each different hexadecimal value which correspond to a record attribute type.

Figure 2.7 MFT Record Attribute Type Table

Attribute Name Hexidecimal Value


Unused 0x00
Standard Information 0x10
File Name 0x30
Object ID 0x40
Security Descriptor 0x50
Volume Name 0x60
Volume Information 0x70
Data 0x80
Index Root 0x90
Index Allocation 0xa0
Bitmap 0xb0
Reparse Point 0xc0
EA Information 0xd0
EA 0xe0
Property Set 0xf0
Logged Utility Stream 0x100

Page 14 of 27
First User Defined Attribute 0x1000
End of Attributes (records) 0xffffffff

To find the entry type of an attribute header, read in one (typically 1024
byte) MFT record from the start of file entry in the MFT into an MFT file structure
as shown in figure 2.5. Next use the attribute offset member of that structure as
an offset into the MFT record to the appropriate position for reading attributes.
Attributes are to be read in one at a time as they are variable in length, content,
purpose, and data type.

Figure 2.8 MFT Record Attribute Header

typedef struct _NTFS_ATTRIBUTE {

DWORD dwType;
DWORD dwFullLength;
BYTE uchNonResFlag;
BYTE uchNameLength;
WORD wNameOffset;
WORD wFlags;
WORD wID;

union ATTR {

struct RESIDENT {
DWORD dwLength;
WORD wAttrOffset;
BYTE uchIndexedTag;
BYTE uchPadding;
} Resident;

struct NONRESIDENT {

LONGLONG n64StartVCN;
LONGLONG n64EndVCN;
WORD wDatarunOffset;
WORD wCompressionSize;
BYTE uchPadding[4];
LONGLONG n64AllocSize;
LONGLONG n64RealSize;
LONGLONG n64StreamSize;

} NonResident;

} Attr;

} _NTFS_ATTRIBUTE, *P_NTFS_ATTRIBUTE;

Page 15 of 27
After reading in an attribute, during extraction of its relevant data, it is first
important to understand that the attribute header and any data header are
completely separate. The attribute header is there to help the programmer find
the relevant data for an attribute, not to be the relevant data header itself.

It is also important to be aware that the MFT record attribute data is stored
in two ways, both as resident and nonresident data. Resident data is literally
resident within the MFT record itself. If this is the case, which it typically is for
smaller files under 1024 bytes, the data for this file fits within the actual MFT
record entry in the MFT table.

On the contrary, if an attribute is marked as nonresident it means that it is


stored in a sequence of runs (clusters) elsewhere on the disk. At this point take
note of the internal ATTR union in figure 2.8, depending on the value of the
uchNonResFlag member of that structure, either one union value or the other is
chosen. If the flag is not true the data is resident in the MFT record and if it is
true the data is nonresident and stored elsewhere on the disk in a series of runs
representing multiple clusters.

In order to properly operate on NTFS attributes, they must first be


extracted from the clusters on where they reside. The next section of this
document describes how to extract attribute data from the file system. It should
be noted that the data attribute literally contains information that in the next
section can be used for extracting resident or nonresident attribute data.

Page 16 of 27
Section 2.7: Extracting Resident Attribute Data

Extracting resident data from a disk is a relatively simple procedure when


compared to nonresident data. In order to extract resident data you first have to
determine that the data you are trying to store is in fact resident in the MFT entry.
The uchNonResFlag member inside of the attribute header will indicate whether
or not a records data is resident or not.

Before we continue make sure you have handy the number of bytes per
cluster. If you haven't calculated this value previously, examine your NTFS boot
sector structure and multiply the two values found in the BIOS parameter block
that reflect the sectors per cluster and the bytes per sector. The result of this
multiplication is the number of bytes per cluster which we will be using later in
this document.

To read actual data which is marked resident in a MFT entry, start from
the beginning of your MFT record entry and move the byte index of the record
itself, to the attribute headers "resident" union structure. Next extract an offset
from that structure, for a correct offset appropriate for reading in an attribute
header. Now that you have the correct offset, set the correct offset position
within a MFT record buffer so that data can begin to be copied.

The number of bytes to read from this position is found in the union
resident structure itself. Figure 2.9 below describes the resident attribute
structure, and shows that the first member is a length field. Reading from the
correct position, at a maximum distance of that length value, resident data can be
extracted into a buffer for use by a developer.

Page 17 of 27
Figure 2.9 Resident Attribute Structure
struct RESIDENT {

DWORD dwLength;
WORD wAttrOffset;
BYTE uchIndexedTag;
BYTE uchPadding;

} Resident;

Section 2.8: Extracting Non-Resident Attribute Data

The extraction of nonresident data is a bit more difficult than resident data
to extract due to the additional calculations required to simply find the location of
the clusters holding the data. This is because of the fact that instead of being
stored resident in the header, the data is stored in one or multiple separate "runs"
(clusters) on the disk.

Below in figure 2.10. the nonresident attribute structure is defined.


Remember that this nonresident attribute structure is really a union with the
resident attribute structure, and remember again that both of these union
structures are part of the overlying NTFS attributes structure for MFT entries.

Figure 2.10 Nonresident Attribute Structure


struct NONRESIDENT {

LONGLONG n64StartVCN;
LONGLONG n64EndVCN;
WORD wDatarunOffset;
WORD wCompressionSize; // compression unit size
BYTE uchPadding[4];
LONGLONG n64AllocSize;
LONGLONG n64RealSize;
LONGLONG n64StreamSize;

} NonResident;

Page 18 of 27
Understand that data stored nonresident from its MFT entry can, and often
will achieve sizes much greater then there is physical memory available to the
application. This makes reading data from physical runs into allocated buffers an
iterative process which usually ends in a sort of bucket brigade with the final
bucket typically being another out file. Due to this, when reading data which is
nonresident, it is very important to check the size. The real size of a file with all
of its runs combined is stored in a large integer value (eight bytes) as a member
inside of the nonresident structure (e.g. n64RealSize).

At this point it becomes impossible to continue extraction without making a


few calculations. First, for ease of use store the nonresident structures data run
offset member in an easy to reference local variable, or make note of it because
it will be used in calculations shortly. For the purpose of clarification, when we
make our first read from our record buffer we will be first reading from this
position.

Begin extraction by creating a one byte value which will represent our
length offset size, the first byte from our read position should now be copied into
this variable. Using the variable you created to hold the nonresident structures,
data run offset earlier increment it by one byte, as this will be our next read
position.

Using the byte we just saved, we will need to split it into 4 bit pieces. The
one byte itself represents a bit field where the top four bits represent a length,
and the last four bits represent an offset. Calculating these two values can be
done by creating a split union typedef for a one byte value which can then be
used as a typecast when representing the length/offset bit-field. Figure 2.11
below shows how a union structure would look if created in this fashion in the C
programming language.

Page 19 of 27
Figure 2.11 A Theoretical Length Offset Union In C

typedef struct _LEN_OFFS_BITFIELD {


union {
BYTE val;
struct {
unsigned char offs:4;
unsigned char len: 4;
} bitfield;
};
} LEN_OFFS_BITFIELD, *P_LEN_OFFS_BITFIELD;

Development Note: when using the offset member of the bit-field you have to bit
shift four bits to the right in order to get the correct offset value unless using the
union in figure 2.11.

To continue, create an eight byte value which will hold a copied record
length for our first data run. The four bit length field in our bit field union contains
the length in bytes from which to copy the full length of the data run. This length
will never exceed eight bytes in length, but can often be much shorter than eight
bytes. This dynamic sizing makes knowing the value in the bit field extremely
important, so that a programmer will not copy in bytes which were not intended
and skew results. Copy in the data from the current record position at the
specific length, and advance the current read position by that same length to
move to the next read offset.

From the newly set read position another read with a possible length of
eight bytes will be read; create another large integer value for the purpose of
storing this data. Using the offset member in the bit field shown in figure 2.11 as
a length parameter, copy in the data from the current read position into our new
variable using memcpy or a similar routine. Now that you've read in your offset
length, advance your read position by the offset member in the bit field to position
the correct placement in the read buffer.

Page 20 of 27
To maintain usability during the extraction of data runs (clusters), a new
large integer value should be created. Since this is the first run being extracted,
add the value of the extracted eight byte offset to the newly created large integer
value. Maintaining this offset will allow us to read multiple different data runs
much easier in the long run.

Once you have found the correct NT relative sector offset in bytes, add the
calculated logical cluster number for the data run to this offset to find our correct
read position. The number of bytes which we will be read, should be read in
increments of the systems cluster size. You should have calculated this value
earlier in this document and put it aside.

Begin reading clusters one by one until you read the correct number of
bytes which is defined by our previous length calculation. Once that is finished,
you have completed the extraction of the first data run. During this process you
can either read the data directly into a file, or into a memory buffer.

Authors Note: Preferably, due to the potential of large files when reading from a
data attribute, it's usually best to read this attribute type directly into a file.

In order to extract multiple data runs, the same steps can be repeated
from the next MFT records read position. Typically for loops, or while loops
which keep track of our attribute position within the MFT buffer, are best suited
for this purpose.

Page 21 of 27
Section 2.9: Directories and the NTFS B+ Tree

In the world of NTFS everything is a file, and that includes directories.


When people discuss NTFS in terms of a high-performance filesystem is typically
because of the way it's implemented. When people talk about B+ trees in terms
of the NTFS file system they're really talking about directories and how
directories reference data in the Master file table. You see a B+ tree is really
almost precisely the same as it looks in Explorer, where every directory can
contain files or other directories and if other directories are contained, those
directories can contain more directories and so on.

In order first to continue it is important to know that one of the metafiles in


the MFT reflects the root system directory. This root system directory is the true
B+ tree utilized by NTFS.

In order to determine if an MFT record entry is a directory the simplest


way to find this information is to look at record attribute headers, and examine
them one by one until the end of records with the intent of looking for an attribute
of type "index root". If you find this index root attribute, you are currently looking
at the MFT entry for a directory. You can test this theory on the root directory in
order to verify this principle.

The actual contents of a directory entry in regard to files, are stored within
a separate attribute header within the same MFT record. The index allocation
attribute of an MFT record starts with a simple header then, stores contiguously
in 4096 byte allocations, the full directory contents as a sequence of INDX
records. Figure 2.12 shown below contains the structure of the header found at
the beginning of each 4096 byte INDX record.

Page 22 of 27
Figure 2.12 INDX Record Standard Header

typedef struct _NTATTR_STANDARD_INDEX_HEADER {

char magicNumber[4];

unsigned short updateSeqOffs;


unsigned short sizeOfUpdateSequenceNumberInWords;

LONGLONG logFileSeqNum;
LONGLONG vcnOfINDX;

DWORD indexEntryOffs;
DWORD sizeOFEntries;
DWORD sizeOfEntryAlloc;

BYTE flags;
BYTE padding[3];

unsigned short updateSeq;

} NTATTR_STANDARD_INDEX_HEADER, *P_NTATTR_STANDARD_INDEX_HEADER;

In the standard index header the four byte structure element,


magicNumber contains the actual ASCII string "INDX" as a marker to denote the
actual beginning of the index standard header. This makes it simple to find INDX
records during application testing. It is important to know that INDX records are
the actual B+ tree entries themselves fundamentally.

Each full INDX record as was said previously is 4096 bytes on a typical
system, each starting with a header, followed by individual INDX record entries.
Figure 2.13 below shows the structure of an individual INDX record entry as it
would appear when encountered in the index allocation after the INDX header on
the raw disk within an INDX record.

Page 23 of 27
Figure 2.13 Individual INDX Record Entry Structure

typedef struct _NTATTR_INDEX_RECORD_ENTRY {


LONGLONG mftReference;
unsigned short sizeOfIndexEntry;
unsigned short filenameOffset;
unsigned short flags;
char padding[2];
LONGLONG mftFileReferenceOfParent;
LONGLONG creationTime;
LONGLONG lastModified;
LONGLONG lastModifiedForFileRecord;
LONGLONG lastAccessTime;
LONGLONG allocatedSizeOfFile;
LONGLONG realFileSize;
LONGLONG fileFlags;
BYTE fNameLength;
BYTE filenameNamespace;
} _NTATTR_INDEX_RECORD_ENTRY, *P_NTATTR_INDEX_RECORD_ENTRY;

The first member of each INDX record entry contains the MFT record
number, which corresponds to the actual MFT record in the MFT itself. In order to
translate this MFT record number to the actual record entry in the MFT, multiply
the value by the size of an entry (typically 1024 bytes) and use that as an offset
from the start of the MFT to find the record. The MFT when searching for
records through offsets, has to be the complete MFT extracted from all runs
(clusters) on the disc, unless the developer only wants to search through entries
stored in the first data run of the MFT (e.g. metafiles).

When searching through INDX records for a particular file, accessing the
file name is of some importance. The file name character string (in Unicode), of
the file for which each Index record entry belongs, can be found directly after the
end of the Index record entry structure. Parts of the Index record entry structure
itself contain information about the dynamic length of this Unicode string as well
as a locale for the file name type (e.g. POSIX, Win32).

Page 24 of 27
For navigating INDX records without having to go through the trouble of
the calculation of elements and sizes for members, the INDX record contains a
member element for the "size of the index entry". Retrieving the correct offset to
the next INDX record is as simple as adding your current offset for reading with
the "size of the index entry" member.

Finding the end of the index entries can be done by taking the previous
read offset and verifying that it is not larger than the INDX standard index header
"size of entries" member. For loops or while loops are traditionally the best
logical programming constructs for executing a search through a sequence of
INDX records.

That is all there truly is to it, a balanced tree in the case of NTFS can be
witnessed as an INDX record which is a directory. Since each directory can have
sub nodes, which are directories, the tree can branch effectively. This is the
principle behind the B+ binary tree as implemented by Microsoft for the NTFS file
system.

Page 25 of 27
Section 3: Simplified Theory of Data Extraction
The following seven steps are the general steps that need to be taken in
order to extract data from files on the raw file system. Each of these steps are
documented in reliable detail in the above section two. All steps documented
have been used practically to perform this task in production environments.

1. Find the partition entry which holds the NTFS partition

2. Navigate to the beginning of the Master File Table

3. Find the root directory metafile entry in the MFT and extract its index
allocation attribute data.

4. Process the INDX records found within the index allocation attribute data
one by one recursively until you find a file that matches the one you are
looking for.

5. In the index entry that matched, find the MFT record number and move to
that record position within the MFT.

6. Record the MFT entry and process its standard attribute headers one by
one until the data attribute is encountered.

7. Use the process of attribute data extraction in order to retrieve the data
attribute, which contains the contents of the file that is being accessed.

Page 26 of 27
Appendix 1. Citations and References
Linux NTFS Driver Project. "NTFS documentation",
Richard Russon and Yuval Fledel 2005
http://data.linux-ntfs.org/ntfsdoc.pdf

Wikipedia entry on NTFS "NTFS",


http://en.wikipedia.org/wiki/NTFS

Wikipedia Entry On The BIOS Parameter Block "BIOS Parameter Block",


http://en.wikipedia.org/wiki/BIOS_parameter_block

Wikipedia Entry on the Master Boot Record "Master Boot Record",


http://en.wikipedia.org/wiki/Master_boot_record

Additional Research Note: for additional information on NTFS which was not
utilized in this document but that could still be potentially useful, see the "external
links section" at the bottom of the NTFS Wikipedia entry.

Page 27 of 27

Вам также может понравиться