Вы находитесь на странице: 1из 9

HCL Infosystems Ltd

HDD & File System

Hard disks were invented in the 1950s. They started as large disks up to 20 inches in
diameter holding just a few megabytes. They were originally called "fixed disks" or
"Winchesters" (a code name used for a popular IBM product). They later became known as
"hard disks" to distinguish them from "floppy disks." Hard disks have a hard platter that holds
the magnetic medium, as opposed to the flexible plastic film found in tapes and floppies.

At the simplest level, a hard disk is not that different from a cassette tape. Both hard disks
and cassette tapes use the same magnetic recording techniques described in the HSW
article titled How Tape Recorders Work. Hard disks and cassette tapes also share the major
benefits of magnetic storage - the magnetic medium can be easily erased and rewritten, and
it will "remember" the magnetic flux patterns stored onto the medium for many years.

Let's look at the big differences between the cassette tapes and hard disks so you can see
how they differ:

• The magnetic recording material on a cassette tape is coated onto a thin plastic strip.
In a hard disk, the magnetic recording material is layered onto a high-precision
aluminum or glass disk. The hard disk platter is then polished to mirror smoothness.
• With a tape, you have to fast-forward or reverse through the tape to get to any
particular point on the tape. This can take several minutes with a long tape. On a
hard disk you can move to any point on the surface of the disk almost instantly.
• In a cassette tape deck, the read/write head touches the tape directly. In a hard disk
the read/write head "flies" over the disk, never actually touching it.
• The tape in a cassette tape deck moves over the head at about 2 inches (about 5.08
cm) per second. A hard disk platter can spin underneath its head at speeds up to
3,000 inches per second (about 170 MPH or 272 KPH)!
• The information on a hard disk is stored in extremely small magnetic domains
compared to a cassette tape's. The size of these domains is made possible by the
precision of the platter and the speed of the media.

Because of these differences, a modern hard disk is able to store an amazing amount of
information in a small space. A hard disk can also access any of its information in a fraction
of a second.

A typical desktop machine will have a hard disk with a capacity of between 10 and 40
gigabytes. Data is stored onto the disk in the form of files. A file is simply a named collection
of bytes. The bytes might be the ASCII codes for the characters of a text file, or they could
be the instructions of a software application for the computer to execute, or they could be the
records of a database, or they could be the pixel colors for a GIF image. No matter what it
contains, however, a file is simply a string of bytes. When a program running on the
computer requests a file, the hard disk retrieves its bytes and sends them to the CPU one at
a time.

There are two ways to measure the performance of a hard disk:

• The data rate - the number of bytes per second that the drive can deliver to the CPU.
Rates between 5 and 40 megabytes per second are common.
• The seek time - the amount of time it takes between the time that the CPU requests a
file and the first byte of the file starts being sent to the CPU. Times between 10 and
20 milliseconds are common.

Page 12-1-1
HCL Infosystems Ltd
Here is a typical hard disk drive:

It is a sealed aluminum box with controller electronics attached to one side. The electronics
control the read/write mechanism and the motor that spins the platters. The electronics also
assemble the magnetic domains on the drive into bytes (reading) and turn bytes into
magnetic domains (writing). The electronics are all contained on a small board that detaches
from the rest of the drive:

Underneath the board are the connections for the motor that spins the platters, as well as a
highly-filtered vent hole that lets internal and external air pressures equalize:

Page 12-1-2
HCL Infosystems Ltd

Removing the cover from the drive reveals an extremely simple but very precise interior:

In this picture you can see:

• The platters, which typically spin at 3,600 or 7,200 RPM when the drive is operating.
These platters are manufactured to amazing tolerances and are mirror smooth (as
you can see in this interesting self-portrait of the author... No easy way to avoid that,
actually!)
• The arm that holds the read/write heads. This arm is controlled by the mechanism in
the upper-left corner, and is able to move the heads from the hub to the edge of the
drive. The arm and its movement mechanism are extremely light and fast. The arm
on a typical hard disk drive can move from hub to edge and back up to 50 times per
second - it is an amazing thing to watch!

In order to increase the amount of information the drive can store, most hard disks have
multiple platters. This drive has three platters and six read-write heads:

Page 12-1-3
HCL Infosystems Ltd

The mechanism that moves the arms on a hard disk has to be incredibly fast and precise. It
can be constructed using a high-speed linear motor.

Many drives use a "voice coil" approach - the same technique used to move the cone of a
speaker on your stereo moves the arm.

Page 12-1-4
HCL Infosystems Ltd

Storing the Data

Data is stored on the surface of a platter in sectors and tracks. Tracks are concentric
circles, and sectors are pie-shaped wedges on a track, like this:

A typical track is shown in yellow; a typical sector is shown in blue. A sector contains a fixed
number of bytes -- for example, 256 or 512. Either at the drive or the operating system level,
sectors are often grouped together into clusters.

The process of low-level formatting a drive establishes the tracks and sectors on the
platter. The starting and ending points of each sector are written onto the platter. This
process prepares the drive to hold blocks of bytes. High-level formatting then writes the
file-storage structures, like the file allocation table, into the sectors. This process prepares
the drive to hold files.

Hard Disk Logical Structures and File Systems


The hard disk is, of course, a medium for storing information. Hard disks grow in size every
year, and as they get larger, using them in an efficient way becomes more difficult. The file
system is the general name given to the logical structures and software routines used to
control access to the storage on a hard disk system. Operating systems use different ways
of organizing and controlling access to data on the hard disk, and this choice is basically
independent of the specific hardware being used--the same hard disk can be arranged in
many different ways, and even multiple ways in different areas of the same disk. The
information in this section in fact straddles the fine line between hardware and software, a
line which gets more and more blurry every year.

The nature of the logical structures on the hard disk has an important influence on the
performance, reliability, expandability and compatibility of your storage subsystem. This
section takes a look at the logical structures on the hard disk and how they are set up and
used for a typical PC installation. I begin with a discussion of different PC operating systems,
and an overview of different file system types. I then go into significant detail describing the
major structures and key operating details of the most common PC file system, FAT
(FAT12/FAT16/VFAT/FAT32). I talk about utilities used for partitioning and formatting hard

Page 12-1-5
HCL Infosystems Ltd
disks, and also talk a bit about disk compression (even though it is no longer nearly as
important as it once was.) I place special emphasis on how to organize the disk for
maximum performance--while not getting bogged down in the minutiae of optimization where
it will buy you little.

Most of the focus in this section is on the FAT family of file systems, because these are by
far the most commonly used, and also the ones with which I am most familiar. I do mention
alternative file systems, but do not go into extensive detail on them, with one exception.
Recognizing the growing role of Windows NT and Windows 2000 systems, a separate,
comprehensive section has been added that describes the NTFS family of file systems. If
you are mostly interested in reading about NTFS, you may want to skip some of the earlier
subsections that describe FAT, and skip directly to the NTFS material. Bear in mind,
however, that some of the NTFS discussions build upon the descriptions of FAT, since in
some ways the file systems are related. So I recommend reading the section in order, if
possible.

FAT Sizes: FAT12, FAT16 and FAT32


Throughout my discussion of file systems, I have referred to the FAT family of file systems.
This includes several different FAT-related file systems, as described here. The file allocation
table or FAT stores information about the clusters on the disk in a table. There are three
different varieties of this file allocation table, which vary based on the maximize size of the
table. The system utility that you use to partition the disk will normally choose the correct
type of FAT for the volume you are using, but sometimes you will be given a choice of which
you want to use.

Since each cluster has one entry in the FAT, and these entries are used to hold the cluster
number of the next cluster used by the file, the size of the FAT is the limiting factor on how
many clusters any disk volume can contain. The following are the three different FAT
versions now in use:

• FAT12: The oldest type of FAT uses a 12-bit binary number to hold the cluster
number. A volume formatted using FAT12 can hold a maximum of 4,086 clusters,
which is 2^12 minus a few values (to allow for reserved values to be used in the
FAT). FAT12 is therefore most suitable for very small volumes, and is used on floppy
disks and hard disk partitions smaller than about 16 MB (the latter being rare today.)
• FAT16: The FAT used for older systems, and for small partitions on modern systems,
uses a 16-bit binary number to hold cluster numbers. When you see someone refer
to a "FAT" volume generically, they are usually referring to FAT16, because it is the
de facto standard for hard disks, even with FAT32 now more popular than FAT16. A
volume using FAT16 can hold a maximum of 65,526 clusters, which is 2^16 less a
few values (again for reserved values in the FAT). FAT16 is used for hard disk
volumes ranging in size from 16 MB to 2,048 MB. VFAT is a variant of FAT16.
• FAT32: The newest FAT type, FAT32 is supported by newer versions of Windows,
including Windows 95's OEM SR2 release, as well as Windows 98, Windows ME and
Windows 2000. FAT32 uses a 28-bit binary cluster number--not 32, because 4 of the
32 bits are "reserved". 28 bits is still enough to permit ridiculously huge volumes--
FAT32 can theoretically handle volumes with over 268 million clusters, and will
support (theoretically) drives up to 2 TB in size. However to do this the size of the
FAT grows very large; see here for details on FAT32's limitations.

Here's a summary table showing how the three types of FAT compare:

Attribute FAT12 FAT16 FAT32


Used For Floppies and very Small to Medium-sized to
small hard disk moderate- sized very large hard disk
Page 12-1-6
HCL Infosystems Ltd
volumes hard disk volumes volumes
Size of Each FAT
12 bits 16 bits 28 bits
Entry
Maximum
Number of 4,086 65,526 ~268,435,456
Clusters
Cluster Size Used 0.5 KB to 4 KB 2 KB to 32 KB 4 KB to 32 KB
Maximum Volume
16,736,256 2,147,123,200 about 2^41
Size

FAT Partition Efficiency: Slack


One issue related to the FAT file system that has gained a lot more attention over the years
is the concept of slack, which is the colloquial term used to refer to wasted space due to the
use of clusters for storing files. This began in the mid-1990s when larger and larger hard
disks began shipping with most systems. Typically, retail systems were not being divided into
multiple partitions, and users began noticing that large quantities of their hard disk seem to
"disappear". In many cases this amounted to hundreds of megabytes on a disk of only 1 to 2
GB in size. When the use of FAT32 became more common this problem was less of an issue
for a while. Today, with hard disks sized at 40 GB or more commonplace, even FAT32 has
problems with slack.

Of course the space doesn't really "disappear", assuming we are not talking about lost
clusters, which can make space really unusable on a disk unless you use a scanning utility
to recover it. The space is simply wasted as a result of the cluster system that FAT uses. A
cluster is the minimum amount of space that can be assigned to any file. No file can use part
of a cluster under the FAT file system. This means, essentially, that the amount of space a
file uses on the disk is "rounded up" to an integer multiple of the cluster size. If you create a
file containing exactly one byte, it will still use an entire cluster's worth of space. Then, you
can expand that file in size until it reaches the maximum size of a cluster, and it will take up
no additional space during that expansion. As soon as you make the file larger than what a
single cluster can hold, a second cluster will be allocated, and the file's disk usage will
double, even though the file only increased in size by one byte.

Think of this in terms of collecting rainwater in quart-sized glass bottles. Even if you collect
just one ounce of water, you have to use a whole bottle. Once the bottle is in use, however,
you can fill it with 31 more ounces, until it is full. Then you'll need another whole bottle to
hold the 33rd ounce.

Since files are always allocated whole clusters, this means that on average, the larger the
cluster size of the volume, the more space that will be wasted. (When collecting rainwater,
it's more efficient to use smaller, cup-sized bottles instead of quart-sized ones, if minimizing
the amount of storage space is a concern). If we take a disk that has a truly random
distribution of file sizes, then on average each file wastes half a cluster. (They use any
number of whole clusters and then a random amount of the last cluster, so on average half a
cluster is wasted). This means that if you double the cluster size of the disk, you double the
amount of storage that is wasted. Storage space that is wasted in this manner, due to space
left at the end of the last cluster allocated to the file, is commonly called slack.

The situation is in reality usually worse than this theoretical average. The files on most hard
disks don't follow a random size pattern; in fact most files tend to be small in size. (Take a
look in your web browser's cache directory sometime!) A hard disk that uses more small files
will result in far more space being wasted. There are utilities that you can use to analyze the
Page 12-1-7
HCL Infosystems Ltd
amount of wasted space on your disk volumes, such as the fantastic Partition Magic. It is not
uncommon for very large disks that are in single partitions to waste up to 40% of their space
due to slack, although 25-30% is more common.

Let's take an example to illustrate the situation. Let's consider a hard disk volume that is
using 32 kb clusters. There are 17,000 files in the partition. If we assume that each file has
half a cluster of slack, then this means that we are wasting 16 kb of space per file. Multiply
that by 17,000 files, and we get a total of 265 MB of slack space. If we assume that most of
the files are smaller, and so therefore on average each file has slack space of around two-
thirds of a cluster instead of one-half, this jumps to 354 MB!

If we were able to use a smaller cluster size for this disk, the amount of space wasted would
reduce dramatically. The table below shows a comparison of the slack for various cluster
sizes for this example. The more files on the disk, the worse the slack gets. To consider the
percentage of disk space wasted in this example, divide the slack figure by the size of the
disk. So if this were a (full) 1.2 GB disk using 32 kb clusters, a full 30% of that space is slack.
If the disk is 2.1 GB in size, the slack percentage is 17%:

Sample Slack Space, Sample Slack Space,


Cluster Size 50% Cluster Slack Per 67% Cluster Slack Per
File File
2 kb 17 MB 22 MB
4 kb 33 MB 44 MB
8 kb 66 MB 89 MB
16 kb 133 MB 177 MB
32 kb 265 MB 354 MB

As you can see, the larger the cluster size used, the more of the disk's space is wasted due
to slack. Therefore, it is better to use smaller cluster sizes whenever possible. This is,
unfortunately, sometimes easier said than done. The number of clusters we can use is
limited by the nature of the FAT file system, and there are also performance tradeoffs in
using smaller cluster sizes. Therefore, it isn't always possible to use the absolute smallest
cluster size in order to maximize free space. One way that cluster sizes can be reduced is to
use FAT32 instead of FAT16, as described in other pages in this section, however, on very
large modern hard disks, big partitions even in FAT32 use rather hefty cluster sizes!

Windows
Windows NT was very successful for Microsoft through the 1990s, but the software giant
didn't rest on its laurels. As Windows NT 4.0 began to age, certain flaws began to show,
including a lack of support for the latest hardware and other limitations. From a file systems
perspective, the most important was the lack of support for FAT32. Microsoft addressed
some of these through the use of service packs, but mostly concentrated on the next version
of the operating system. It had been unofficially called "Windows NT 5.0" for some time, but
Microsoft instead called the new operating system Windows 2000.

Windows 2000 builds upon Windows NT 4.0 in most respects, and differs from the older
operating system in two ways when it comes to file systems. The first is the addition of
support for FAT32. This was a much-desired change, especially with FAT32 all but replacing
FAT16 in newer Windows 9x/ME systems. The other was that NTFS under Windows 2000
was enhanced, through the creation of the NTFS 5.0 version of that file system. Windows
2000 will still read older NTFS partitions, but it must be installed on an NTFS 5.0 partition;
NTFS 5.0 is Windows 2000's "preferred" file system.
Page 12-1-8
HCL Infosystems Ltd
OS/2
In the early 1990s, two of the biggest names in the PC world, IBM and Microsoft, joined
forces to create OS/2, with the goal of making it the "next big thing" in graphical operating
systems. Well, it didn't quite work out that way. The story behind OS/2 includes some of the
most fascinating bits of PC industry history, but it's a long story and not one that really
makes sense to get into here. The short version goes something like this:

1. Microsoft and IBM create OS/2 with high hopes that it will revolutionize the PC
desktop.
2. OS/2 has some significant technical strengths but also some problems.
3. Microsoft and IBM fight over how to fix the problems, and also over what direction to
take for the future of the operating system.
4. Microsoft decides, based on some combination of frustration over problems and
desire for absolute control, to drop OS/2 and focus on Windows instead.
5. IBM and Microsoft feud.
6. IBM supports OS/2 (somewhat half-heartedly) on its own, while Microsoft dominates
the industry with various versions of Windows.

OS/2's file system support is similar, in a way to that of Windows NT's. OS/2 supports FAT12
and FAT16 for compatibility, but is really designed to use its own special file system, called
HPFS. HPFS is similar to NTFS (NT's native file system) though it is certainly not the same.
OS/2 does not have support for FAT32 built in, but that there are third-party tools available
that will let OS/2 access FAT32 partitions. This may be required if you are running a machine
with both OS/2 and Windows partitions. I believe that OS/2 does not include support for
NTFS partitions.

Page 12-1-9

Вам также может понравиться