Вы находитесь на странице: 1из 14

Storage and File Structure

Several types of data storage exist in most computer systems. These storage media are classified
by the speed with which data can be accessed, by the cost per unit of data to buy the medium,
and by the medium’s reliability.
I. Magnetic Disk

Magnetic disks provide the bulk of secondary storage for modern computer systems. A very
large database may require hundreds of disks.
1.1 Physical Characteristics of Disks

Physically, disks are relatively simple. Each disk platter has a flat, circular shape. Its two
surfaces are covered with a magnetic material, and information is recorded on the surfaces.
Platters are made from rigid metal or glass.
When the disk is in use, a drive motor spins it at a constant high speed . There is a read–write
head positioned just above the surface of the platter. The disk surface is logically divided into
tracks, which are subdivided into sectors. A sector is the smallest unit of information that can
be read from or written to the disk.
The read–write head stores information on a sector magnetically as reversals of the direction of
magnetization of the magnetic material. Each side of a platter of a disk has a read–write head that
moves across the platter to access different tracks. A disk typically contains many platters, and
the read–write heads of all the tracks are mounted on a single assembly called a disk arm, and
move together. The disk platters mounted on a spindle and the heads mounted on a disk arm are
together known as head–disk assemblies. Since the heads on all the platters move together,
when the head on one platter is on the ith track, the heads on all other platters are also on the ith
track of their respective platters.Hence, the ith tracks of all the platters together are called the ith
cylinder.
Disk Controller:
A disk controller interfaces between the computer system and the actual hardware of the disk
drive.
Functions of disk controller:
1. A disk controller accepts high-level commands to read or write a sector, and initiates
actions, such as moving the disk arm to the right track and actually reading or writing the
data.
2. Disk controllers also attach checksums to each sector that is written; the checksum is
computed from the data written to the sector. When the sector is read back, the controller
computes the checksum again from the retrieved data and compares it with the stored
checksum. If such an error occurs,the controller will retry the read several times; if the
error continues to occur, the controller will signal a read failure.
3. Another interesting task that disk controllers perform is remapping of badsectors. If the
controller detects that a sector is damaged when the disk is initially formatted, or when an
attempt is made to write the sector, it can logically map the sector to a different physical
location.
Disks are connected to a computer system through a high-speed interconnection.There are a
number of common interfaces for connecting disks to computers of which the most commonly
used today are
(1) SATA – serial ATA
(2) small-computer-system interconnect (SCSI)
(3) SAS (which stands for serial attached SCSI).
(4) Fibre Channel interface.

1.2 Performance Measures of Disks


The main measures of the qualities of a disk are capacity, access time, data-transfer rate, and
reliability.
Access Time :
Access time is the time between the issue of a read or write request and the beginning of data
transfers. To access data on a given sector of a disk, the arm must first position over the correct
track and the time for repositioning the arm is called the seek time. Once the head has reached
the desired track, the time spent waiting for the sector to be accessed to appear under the head is
called the rotational latency time. The access time is then the sum of the seek time and the
latency.
Data-transfer rate :
The data-transfer rate is the rate at which data can be retrieved from or stored to the disk.
Reliability :
The final commonly used measure of a disk is the mean time to failure (MTTF),which is a
measure of the reliability of the disk. The mean time to failure of a disk (or of any other system)
is the amount of time that, on average, we can expect the system to run continuously without any
failure.
II. RAID
A large number of disks are needed to store the data of applications like web, database,
multimedia etc. A variety of disk-organization techniques, collectively called redundant arrays
of independent disks (RAID), have been proposed to achieve improved performance and
reliability. RAID systems are used for their higher reliability and higher performance rate.
Another key justification for RAID use is easier management and operations.
Improvement of Reliability via Redundancy
The solution to the problem of reliability is to introduce redundancy. The simplest approach to
introducing redundancy is to duplicate every disk. This technique is called mirroring .A logical
disk then consists of two physical disks, and every write is carried out on both disks. If one of the
disks fails, the data can be read from the other. Data will be lost only if the second disk fails
before the first failed disk is repaired.
Improvement in Performance via Parallelism
Data Striping:
With multiple disks, we can improve the transfer rate by striping data across multiple disks.
Bit level striping: Consists of splitting the bits of each byte across multiple disks.
For example, if we have an array of eight disks, we write bit i of each byte to disk i. The array of
eight disks can be treated as a single disk with sectors that are eight times the normal size, and,
more important, that has eight times the transfer rate. In such an organization, every disk
participates in every access (read or write).
Block-level striping stripes blocks across multiple disks. It treats the array of disks as a single
large disk, and it gives blocks logical numbers; we assume the block numbers start from 0. With
an array of n disks, block-level striping assigns logical block i of the disk array to disk (i mod n)
+ 1; it uses the [i/n]th physical block of the disk to store logical block i.
2.1 RAID Levels
Various alternative schemes provide redundancy at lower cost by combining disk striping with
“parity” bits. The schemes are classified into RAID levels.
RAID level 0 (Block-level striping)
Refers to disk arrays with striping at the level of blocks, but without any redundancy (such as
mirroring or parity bits). Figure shows an array of size 4.
RAID level 1 (mirroring with block striping)
Refers to disk mirroring with block striping. Figure shows a mirrored organization that holds
four disks’ worth of data.
RAID level 2 (memory-style error-correcting-codes)
Known as memory-style error-correcting-code (ECC) organization employs parity bits.
Memory systems have long used parity bits for error detection and correction. The idea of error-
correcting codes can be used directly in disk arrays by striping bytes across disks. For example,
the first bit of each byte could be stored in disk 0, the second bit in disk 1, and so on until the
eighth bit is stored in disk 7, and the error-correction bits are stored in further disks.
The disks labeled P store the error-correction bits. If one of the disks fails, the remaining bits of
the byte and the associated error-correction bits can be read from other disks, and can be used to
reconstruct the damaged data.
Figure c shows an array of size 4; note RAID level 2 requires only three disks’ overhead for four
disks of data, unlike RAID level 1, which required four disks’ overhead.
RAID level 3 (Bit interleaved parity)
Bit-interleaved parity organization, improves on level 2.Disk controllers, unlike memory
systems, can detect whether a sector has been read correctly, so a single parity bit can be used
for error correction, as well as for detection.
The idea is as follows: If one of the sectors gets damaged, the system knows exactly which sector
it is, and, for each bit in the sector, the system can figure out whether it is a 1 or a 0 by
computing the parity of the corresponding bits from sectors in the other disks. If the parity of the
remaining bits is equal to the stored parity, the missing bit is 0; otherwise, it is 1.
RAID level 3 has only a one-disk overhead when compared to level 2.
RAID level 4 (Block-interleaved parity)
Block-interleaved parity organization, uses block-level striping, like RAID 0. In addition, keeps
a parity block on a separate disk for corresponding blocks from N other disks. If one of the disks
fails, the parity block can be used with the corresponding blocks from the other disks to restore
the blocks of the failed disk. The transfer rates for large reads is high, since all the disks can be
read in parallel; large writes also have high transfer rates, since the data and parity can be written
in parallel.
RAID level 5 (Block-interleaved distributed parity)
Block-interleaved distributed parity, improves on level 4 by partitioning data and parity among
all N+1 disks, instead of storing data in N disks and parity in one disk. In level 5, all disks can
participate in satisfying read requests.So level 5 increases the total number of requests that can
be met in a given amount of time. For each set of N logical blocks, one of the disks stores the
parity, and the other N disks store the blocks.
Figure below shows the setup. The P’s are distributed across all the disks. For example, with an
array of 5 disks, the parity block, labeled Pk , for logical blocks 4k, 4k + 1, 4k + 2, 4k + 3 is
stored in disk k mod 5; the corresponding blocks of the other four disks store the 4 data blocks 4k
to 4k+3.
The following table indicates how the first 20 blocks, numbered 0 to 19, and their parity
blocks are laid out. The pattern shown gets repeated on further blocks.

RAID level 6 (P + Q redundancy)


The P + Q redundancy scheme, is much like RAID level 5, but stores extra redundant
information to guard against multiple disk failures. Instead of using parity, level 6 uses error-
correcting codes such as the Reed–Solomon codes . In the scheme in figure , 2 bits of redundant
data are stored for every 4 bits of data—unlike 1 parity bit in level 5—and the system can
tolerate two disk failures.
(g) RAID 6: P + Q redundancy
III. File Organization
A file is organized logically as a sequence of records. These records are mapped onto disk
blocks. Each file is also logically partitioned into fixed-length storage units called blocks, which
are the units of both storage allocation and data transfer. A block may contain several records.In
addition, we shall require that each record is entirely contained in a single block; that is, no
record is contained partly in one block, and partly in another.
This restriction simplifies and speeds up access to data items.
3.1 Fixed-Length Records
Instead of allocating a variable amount of bytes for the attributes, we allocate the maximum
number of bytes that each attribute can hold.
As an example, let us consider a file of instructor records for our university database. Each
record of this file is defined (in pseudocode) as:
The instructor record is 53 bytes long. A simple approach is to use the first 53 bytes for the first
record, the next 53 bytes for the second record, and so on. However, there are two problems with
this simple approach:

To avoid the first problem, we allocate only as many records to a block as would fit entirely in
the block.Any remaining bytes of each block are left unused.
When a record is deleted, we could move the record that came after it into the space formerly
occupied by the deleted record, and so on, until every record following the deleted record has
been moved ahead.
It is undesirable to move records to occupy the space freed by a deleted record, since doing so
requires additional block accesses. Thus, we need to introduce an additional structure.
At the beginning of the file, we allocate a certain number of bytes as a file header. We store
theres the address of the first record whose contents are deleted.We use this first record to store
the address of the second available record, and so on. The deleted records thus form a linked list,
which is often referred to as a free list.

On insertion of a new record, we use the record pointed to by the header. We change the header
pointer to point to the next available record. If no space is available, we add the new record to
the end of the file.
3.2 Variable-Length Records
Variable-length records arise in database systems in several ways:
• Storage of multiple record types in a file.
• Record types that allow variable lengths for one or more fields.
• Record types that allow repeating fields, such as arrays or multisets.
The representation of a record with variable-length attributes typically has two parts:
- an initial part with fixed length attribute
- data for variable length attributes.
Fixed-length attributes, such as numeric values, dates, or fixed length character strings are
allocated as many bytes as required to store their value.
Variable-length attributes, such as varchar types, are represented in the initial part of the record
by a pair (offset, length), where offset denotes where the data for that attribute begins within the
record, and length is the length in bytes of the variable-sized attribute. The values for these
attributes are stored consecutively, after the initial fixed-length part of the record.
Thus, the initial part of the record stores a fixed size of information about each attribute,
whether it is fixed-length or variable-length.

The figure shows an instructor record,whose first three attributes ID, name, and dept name are
variable-length strings, and whose fourth attribute salary is a fixed-sized number.
The figure also illustrates the use of a null bitmap, which indicates which attributes of the
record have a null value. In this particular record, if the salary were null, the fourth bit of the
bitmapwould be set to 1, and the salary value stored in bytes 12 through 19 would be ignored.
Slotted page structure :
We next address the problem of storing variable-length records in a block. The slotted-
page structure is commonly used for organizing records within a block, and is shown in Figure.
There is a header at the beginning of each block, containing the following information:
1. The number of record entries in the header.
2. The end of free space in the block.
3. An array whose entries contain the location and size of each record.
The actual records are allocated contiguously in the block, starting from the end of the block. The
free space in the block is contiguous, between the final entry in the header array, and the first
record. If a record is inserted, space is allocated for it at the end of free space, and an entry
containing its size and location is added to the header.
If a record is deleted, the space that it occupies is freed, and its entry is set to deleted (its size is
set to −1, for example). Further, the records in the block before the deleted record are moved, so
that the free space created by the deletion gets occupied, and all free space is again between the
final entry in the header array and the first record. The end-of-free-space pointer in the header is
appropriately updated as well.
IV. Organization of Records in Files
A relation is a set of records. Several of the possible ways of organizing records in files are:
• Heap file organization. Any record can be placed anywhere in the file where there is space for
the record. There is no ordering of records. Typically, there is a single file for each relation.
• Sequential file organization. Records are stored in sequential order, according to the value of
a “search key” of each record.
• Hashing file organization. A hash function is computed on some attribute of each record. The
result of the hash function specifies in which block of the file the record should be placed.
Sequential File Organization
A sequential file is designed for efficient processing of records in sorted order based on some
search key. A search key is any attribute or set of attributes; it need not be the primary key, or
even a superkey.
To permit fast retrieval of records in search-key order, we chain together records by pointers.
The pointer in each record points to the next record in search-key order. Furthermore, to
minimize the number of block accesses in sequential file processing, we store records physically
in search-key order, or as close to search-key order as possible.
Figure shows a sequential file of instructor records taken from our university example. In that
example, the records are stored in search-key order, using ID as the search key.
The sequential file organization allows records to be read in sorted order; that can be useful for
display purposes, as well as for certain query-processing algorithms.
Insertion & Deletion :
We can manage deletion by using pointer chains, as we saw in the case of fixed length records..
For insertion, we apply the following rules:
1. Locate the record in the file that comes before the record to be inserted in search-key order.
2. If there is a free record (that is, space left after a deletion) within the same block as this record,
insert the new record there. Otherwise, insert the new record in an overflow block. In either case,
adjust the pointers so as to chain together the records in search-key order.

If relatively few records need to be stored in overflow blocks, this approach works well.
Eventually, however, the correspondence between search-key order and physical order may be
totally lost over a period of time, in which case sequential processing will become much less
efficient. At this point, the file should be reorganized so that it is once again physically in
sequential order. Such reorganizations are costly, and must be done during times when the
system load is low.The frequency with which reorganizations are needed depends on the
frequency of insertion of new records.
Hash File Organization

In a hash file organization, we obtain the address of the disk block containing a desired record
directly by computing a function on the search-key value of the record. Use the term bucket to
denote a unit of storage that can store one or more records. A bucket is typically a disk block, but
could be chosen to be smaller or larger than a disk block.
Formally, let K denote the set of all search-key values, and let B denote the set of all bucket
addresses. A hash function h is a function from K to B. Let h denote a hash function.
To perform a lookup on a search-key value Ki, we simply compute h(Ki), then search the bucket
with that address. Suppose that two search keys, K5 and K7, have the same hash value; that is,
h(K5) = h(K7). If we perform a lookup on K5, the bucket h(K5) contains records with search-key
values K5 and records with search key values K7. Thus, we have to check the search-key value
of every record in the bucket to verify that the record is one that we want.
To insert a record with search key Ki, we compute h(Ki), which gives the address of the bucket
for that record. Assume for now that there is space in the bucket to store the record. Then, the
record is stored in that bucket.
Deletion is equally straightforward. If the search-key value of the record to be deleted is Ki, we
compute h(Ki), then search the corresponding bucket for that record, and delete the record from
the bucket.
An ideal hash function distributes the stored keys uniformly across all the buckets, so that every
bucket has the same number of records. we want to choose a hash function that assigns search-
key values to buckets in such a way that the distribution has these qualities:
• The distribution is uniform. That is, the hash function assigns each bucket the
same number of search-key values from the set of all possible search-key values.
• The distribution is random. That is, in the average case, each bucket will have nearly the
same number of values assigned to it, regardless of the actual distribution
of search-key values.
Typical hash functions perform computation on the internal binary machine representation
of characters in the search key. A simple hash function of this type first computes the sum of the
binary representations of the characters of a key, then returns the sum modulo the number of
buckets.
The figure below shows the application of such a scheme, with eight buckets, to the instructor
file, under the assumption that the ith letter in the alphabet is represented by the integer i.
Bucket Overflow

The condition of bucket-overflow is known as collision. This is a fatal state for any static hash
function. In this case, overflow chaining can be used.

• Overflow Chaining − When buckets are full, a new bucket is allocated for the same hash
result and is linked after the previous one. This mechanism is called Closed Hashing.
• Linear Probing − When a hash function generates an address at which data is already stored, the
next free bucket is allocated to it. This mechanism is called Open Hashing.

Heap File Organization

Вам также может понравиться