Академический Документы
Профессиональный Документы
Культура Документы
Several types of data storage exist in most computer systems. These storage media are classified
by the speed with which data can be accessed, by the cost per unit of data to buy the medium,
and by the medium’s reliability.
I. Magnetic Disk
Magnetic disks provide the bulk of secondary storage for modern computer systems. A very
large database may require hundreds of disks.
1.1 Physical Characteristics of Disks
Physically, disks are relatively simple. Each disk platter has a flat, circular shape. Its two
surfaces are covered with a magnetic material, and information is recorded on the surfaces.
Platters are made from rigid metal or glass.
When the disk is in use, a drive motor spins it at a constant high speed . There is a read–write
head positioned just above the surface of the platter. The disk surface is logically divided into
tracks, which are subdivided into sectors. A sector is the smallest unit of information that can
be read from or written to the disk.
The read–write head stores information on a sector magnetically as reversals of the direction of
magnetization of the magnetic material. Each side of a platter of a disk has a read–write head that
moves across the platter to access different tracks. A disk typically contains many platters, and
the read–write heads of all the tracks are mounted on a single assembly called a disk arm, and
move together. The disk platters mounted on a spindle and the heads mounted on a disk arm are
together known as head–disk assemblies. Since the heads on all the platters move together,
when the head on one platter is on the ith track, the heads on all other platters are also on the ith
track of their respective platters.Hence, the ith tracks of all the platters together are called the ith
cylinder.
Disk Controller:
A disk controller interfaces between the computer system and the actual hardware of the disk
drive.
Functions of disk controller:
1. A disk controller accepts high-level commands to read or write a sector, and initiates
actions, such as moving the disk arm to the right track and actually reading or writing the
data.
2. Disk controllers also attach checksums to each sector that is written; the checksum is
computed from the data written to the sector. When the sector is read back, the controller
computes the checksum again from the retrieved data and compares it with the stored
checksum. If such an error occurs,the controller will retry the read several times; if the
error continues to occur, the controller will signal a read failure.
3. Another interesting task that disk controllers perform is remapping of badsectors. If the
controller detects that a sector is damaged when the disk is initially formatted, or when an
attempt is made to write the sector, it can logically map the sector to a different physical
location.
Disks are connected to a computer system through a high-speed interconnection.There are a
number of common interfaces for connecting disks to computers of which the most commonly
used today are
(1) SATA – serial ATA
(2) small-computer-system interconnect (SCSI)
(3) SAS (which stands for serial attached SCSI).
(4) Fibre Channel interface.
To avoid the first problem, we allocate only as many records to a block as would fit entirely in
the block.Any remaining bytes of each block are left unused.
When a record is deleted, we could move the record that came after it into the space formerly
occupied by the deleted record, and so on, until every record following the deleted record has
been moved ahead.
It is undesirable to move records to occupy the space freed by a deleted record, since doing so
requires additional block accesses. Thus, we need to introduce an additional structure.
At the beginning of the file, we allocate a certain number of bytes as a file header. We store
theres the address of the first record whose contents are deleted.We use this first record to store
the address of the second available record, and so on. The deleted records thus form a linked list,
which is often referred to as a free list.
On insertion of a new record, we use the record pointed to by the header. We change the header
pointer to point to the next available record. If no space is available, we add the new record to
the end of the file.
3.2 Variable-Length Records
Variable-length records arise in database systems in several ways:
• Storage of multiple record types in a file.
• Record types that allow variable lengths for one or more fields.
• Record types that allow repeating fields, such as arrays or multisets.
The representation of a record with variable-length attributes typically has two parts:
- an initial part with fixed length attribute
- data for variable length attributes.
Fixed-length attributes, such as numeric values, dates, or fixed length character strings are
allocated as many bytes as required to store their value.
Variable-length attributes, such as varchar types, are represented in the initial part of the record
by a pair (offset, length), where offset denotes where the data for that attribute begins within the
record, and length is the length in bytes of the variable-sized attribute. The values for these
attributes are stored consecutively, after the initial fixed-length part of the record.
Thus, the initial part of the record stores a fixed size of information about each attribute,
whether it is fixed-length or variable-length.
The figure shows an instructor record,whose first three attributes ID, name, and dept name are
variable-length strings, and whose fourth attribute salary is a fixed-sized number.
The figure also illustrates the use of a null bitmap, which indicates which attributes of the
record have a null value. In this particular record, if the salary were null, the fourth bit of the
bitmapwould be set to 1, and the salary value stored in bytes 12 through 19 would be ignored.
Slotted page structure :
We next address the problem of storing variable-length records in a block. The slotted-
page structure is commonly used for organizing records within a block, and is shown in Figure.
There is a header at the beginning of each block, containing the following information:
1. The number of record entries in the header.
2. The end of free space in the block.
3. An array whose entries contain the location and size of each record.
The actual records are allocated contiguously in the block, starting from the end of the block. The
free space in the block is contiguous, between the final entry in the header array, and the first
record. If a record is inserted, space is allocated for it at the end of free space, and an entry
containing its size and location is added to the header.
If a record is deleted, the space that it occupies is freed, and its entry is set to deleted (its size is
set to −1, for example). Further, the records in the block before the deleted record are moved, so
that the free space created by the deletion gets occupied, and all free space is again between the
final entry in the header array and the first record. The end-of-free-space pointer in the header is
appropriately updated as well.
IV. Organization of Records in Files
A relation is a set of records. Several of the possible ways of organizing records in files are:
• Heap file organization. Any record can be placed anywhere in the file where there is space for
the record. There is no ordering of records. Typically, there is a single file for each relation.
• Sequential file organization. Records are stored in sequential order, according to the value of
a “search key” of each record.
• Hashing file organization. A hash function is computed on some attribute of each record. The
result of the hash function specifies in which block of the file the record should be placed.
Sequential File Organization
A sequential file is designed for efficient processing of records in sorted order based on some
search key. A search key is any attribute or set of attributes; it need not be the primary key, or
even a superkey.
To permit fast retrieval of records in search-key order, we chain together records by pointers.
The pointer in each record points to the next record in search-key order. Furthermore, to
minimize the number of block accesses in sequential file processing, we store records physically
in search-key order, or as close to search-key order as possible.
Figure shows a sequential file of instructor records taken from our university example. In that
example, the records are stored in search-key order, using ID as the search key.
The sequential file organization allows records to be read in sorted order; that can be useful for
display purposes, as well as for certain query-processing algorithms.
Insertion & Deletion :
We can manage deletion by using pointer chains, as we saw in the case of fixed length records..
For insertion, we apply the following rules:
1. Locate the record in the file that comes before the record to be inserted in search-key order.
2. If there is a free record (that is, space left after a deletion) within the same block as this record,
insert the new record there. Otherwise, insert the new record in an overflow block. In either case,
adjust the pointers so as to chain together the records in search-key order.
If relatively few records need to be stored in overflow blocks, this approach works well.
Eventually, however, the correspondence between search-key order and physical order may be
totally lost over a period of time, in which case sequential processing will become much less
efficient. At this point, the file should be reorganized so that it is once again physically in
sequential order. Such reorganizations are costly, and must be done during times when the
system load is low.The frequency with which reorganizations are needed depends on the
frequency of insertion of new records.
Hash File Organization
In a hash file organization, we obtain the address of the disk block containing a desired record
directly by computing a function on the search-key value of the record. Use the term bucket to
denote a unit of storage that can store one or more records. A bucket is typically a disk block, but
could be chosen to be smaller or larger than a disk block.
Formally, let K denote the set of all search-key values, and let B denote the set of all bucket
addresses. A hash function h is a function from K to B. Let h denote a hash function.
To perform a lookup on a search-key value Ki, we simply compute h(Ki), then search the bucket
with that address. Suppose that two search keys, K5 and K7, have the same hash value; that is,
h(K5) = h(K7). If we perform a lookup on K5, the bucket h(K5) contains records with search-key
values K5 and records with search key values K7. Thus, we have to check the search-key value
of every record in the bucket to verify that the record is one that we want.
To insert a record with search key Ki, we compute h(Ki), which gives the address of the bucket
for that record. Assume for now that there is space in the bucket to store the record. Then, the
record is stored in that bucket.
Deletion is equally straightforward. If the search-key value of the record to be deleted is Ki, we
compute h(Ki), then search the corresponding bucket for that record, and delete the record from
the bucket.
An ideal hash function distributes the stored keys uniformly across all the buckets, so that every
bucket has the same number of records. we want to choose a hash function that assigns search-
key values to buckets in such a way that the distribution has these qualities:
• The distribution is uniform. That is, the hash function assigns each bucket the
same number of search-key values from the set of all possible search-key values.
• The distribution is random. That is, in the average case, each bucket will have nearly the
same number of values assigned to it, regardless of the actual distribution
of search-key values.
Typical hash functions perform computation on the internal binary machine representation
of characters in the search key. A simple hash function of this type first computes the sum of the
binary representations of the characters of a key, then returns the sum modulo the number of
buckets.
The figure below shows the application of such a scheme, with eight buckets, to the instructor
file, under the assumption that the ith letter in the alphabet is represented by the integer i.
Bucket Overflow
The condition of bucket-overflow is known as collision. This is a fatal state for any static hash
function. In this case, overflow chaining can be used.
• Overflow Chaining − When buckets are full, a new bucket is allocated for the same hash
result and is linked after the previous one. This mechanism is called Closed Hashing.
• Linear Probing − When a hash function generates an address at which data is already stored, the
next free bucket is allocated to it. This mechanism is called Open Hashing.