Вы находитесь на странице: 1из 2

Testing your hard drive in Linux

I recently needed to test a hard drive in Linux, and had a hard time finding out how to do it
properly. In DOS, you can run a Surface Scan in Scandisk. Linux does not have anything
called Surface Scan, however. In Linux, it is called checking for bad blocks.
What is a block? I'm not exactly sure how it's defined, but it's basically a chunk of data on
the hard drive. So if you have a 40 gig partition, you divide it into a whole bunch of indexed
blocks that might be like 4096 bytes each. Block 0 is the first 4096 bytes, Block 1 the second
4096 bytes, and so on. An important thing you should know is that the "blocks" are a part of
the filesystem. At time of formatting, a blocksize is chosen for the filesystem. The partition
itself does not have a blocksize, the filesystem does.
If part of your hard drive is messed up, the block or blocks that contain that bad part should
be marked bad. Basically, this means the block number is added to a list of bad blocks.
Then, you give the list to the filesystem on the partition. The file system stores it somewhere
and remembers not to use those bad blocks. If you use e2fsck, the process of giving the list
to the filesystem is automated. Since that prevents errors, that is preferable.
There are 2 general ways to find the bad blocks.
The first way is to just try reading every block. If one of the reads causes the hard drive to
throw an error, then the block in question is marked bad. This, however, is not the best way,
because sometimes the hard drive can have a bad part of the disk that doesn't throw an
error when read for some reason. The second, slower method, is to write data to every block
on the hard disk, and make sure it's the same when it's read back. It is possible to do this
without erasing the data in your partition, but it makes it take longer. This second method,
read/write, is what is done in a DOS Surface Scan.
Programs to use
In Linux, there is pretty much only one program that is used to check for bad blocks. It is
called, surprisingly enough, "badblocks". You should only use this program directly, though,
when you are checking a blank partition, or a non ext2 or ext3 filesystem. When checking an
ext2 or ext3 filesystem partition, you should use e2fsck, which runs badblocks in the
background.
Using e2fsck
You should use this when checking an ext2 or ext3 filesystem. These 2 methods
automatically save the bad blocks found into the filesystem so that those parts of the hard
drive are no longer used.
Read-only method: e2fsck -c -C /dev/hda1 ---OR--- e2fsck -c -C -y /dev/hda1 (This answers
yes to all questions, so it is sure to finish by itself.)
Non-destructive read/write method: e2fsck -c -c -C /dev/hda1 ---OR--- e2fsck -c -c -C -y
/dev/hda1 (This answers yes to all questions, so it is sure to finish by itself.)
Note: Filesystem must NOT be mounted. You therefore have to use a rescue cd if you need to
check the root filesystem. I recommend this cd: http://rescuecd.sourceforge.net/

Using badblocks
You should use this when checking a blank partition. You can also use it on a partition with a
non ext2 or ext3 filesystem. There might be an equivilent of e2fsck for your filesystem,
though, so you might try that. When you use badblocks, the bad blocks list for your partition
will not be saved in the filesystem automatically. It is possible to save the badblocks list, and
then have the filesystem read in that list. The problem is, you must set the blocksize in
badblocks to be the blocksize the filesystem will be, or currently is. Otherwise the block
numbers will not correspond to the blocks in that filesystem. I'm not going to describe how
to import the block list into the filesystem. You can read the man files for that information.
Read-only method: badblocks -b 4096 -p 4 -c 32768 -s /dev/hda1
The number after -b is the block size. 4096 means 4096 bytes. You don't need to change this
unless you're using the bad blocks list for something.
The number after -p is the number of passes it should run on the hard drive. The 4 means it
will stop testing the hard drive after it has tested the entire hard drive 4 times without the
bad blocks list changing. So if it finds new bad blocks on third pass, and none after that, it
will have done 7 passes all together. If you don't want to do multiple passes, you can skip
this switch to save time.
The number after -c is the number of blocks it tests at a time. The default is 16. The -b
number * the -c number equals the number of bytes of RAM it will use. You should probably
use as much of your available memory as possible to save time. Just make sure you don't
use too much. You certainly wouldn't want this data to be swapped. If you run out of physical
and swap memory, the program will just crash. The above settings use 128 megs of RAM.
Destructive read/write method: badblocks -b 4096 -p 4 -c 16384 -w -s /dev/hda1
The number after -b is the block size. 4096 means 4096 bytes. You don't need to change this
unless you're using the bad blocks list for something.
The number after -p is the number of passes it should run on the hard drive. The 4 means it
will stop testing the hard drive after it has tested the entire hard drive 4 times without the
bad blocks list changing. So if it finds new bad blocks on third pass, and none after that, it
will have done 7 passes all together. If you don't want to do multiple passes, you can skip
this switch to save time.
The number after -c is the number of blocks it tests at a time. The default is 16. The -b
number * the -c number * 2 equals the number of bytes of RAM it will use. You should
probably use as much of your available memory as possible to save time. Just make sure you
don't use too much. You certainly wouldn't want this data to be swapped. If you run out of
physical and swap memory, the program will just crash. The above settings use 128 megs of
RAM.
Other things missing from this page
There is a non-destructive read/write mode of badblocks. (You should use e2fsck for ext2 and
ext3 filesystems, though.)
If your hard drive has bad blocks randomly scattered throughout, it is probably shot. If they
are localized to a small area, then it is more likely still useable.
-Aaron Talbot
talasonic@earthlink.net