Академический Документы
Профессиональный Документы
Культура Документы
UNIT- 1
1) Cache
2) Main memory
3) Flash memory
4) Magnetic disks
5) Optical storage (CD or DVD)
6) Tape storage
CACHE : Most costly and fastest form of storage. Usually very small, volatile and managed
by the operating system.
MAIN MEMORY(MM) : The storage area for data available to be operated on.
General-purpose machine instructions operate on main memory.
Contents of main memory are usually lost in a power failure or ``crash''.
Usually too small (even with megabytes) and too expensive to store the entire database.
FLASH MEMORY : EEPROM (electrically erasable programmable read-only memory).
Data in flash memory survive from power failure.
Reading data from flash memory takes about 10 nano-secs (roughly as fast as from main memory),
and writing data into flash memory is more complicated i.e (slow): write-once takes
about 4-10 microsecs.
To overwrite what has been written, one has to first erase the entire bank of the memory. It may
support only a limited number of erase cycles ( to ).
It has found its popularity as a replacement for disks for storing small volumes of data
(5-10 megabytes).
Cost per unit of storage roughly similar to main memory
Widely used in embedded devices such as digital cameras
MAGNETIC DISK : primary medium for long term storage
Typically the entire database is stored on disk.
Data is stored on spinning disk, and read/written magnetically
Data must be moved from disk to main memory in order for the data to be operated on.
After operations are performed, data must be copied back to disk if any changes were made.(much
slower access than main memory.
Disk storage is called direct access storage as it is possible to read data on the disk in any order
(unlike sequential access).
Disk storage usually survives power failures and system crashes (very rarely data destroy).
Capacities range up to roughly 400 GB currently.
– Much larger capacity and cost/byte than main memory/flash memory
– Growing constantly and rapidly with technology improvements (factor of 2 to 3 every 2
years)
OPTICAL STORAGE : CD-ROM (compact-disk read-only memory), WORM (write-once read-many)
disk (for archival storage of data), and Juke box (containing a few drives and numerous disks loaded on
demand). Non-volatile, data is read optically from a spinning disk using a laser.
• CD-ROM (640 MB) and DVD (4.7 to 17 GB) most popular forms
• Write-one, read-many (WORM) optical disks used for archival storage (CD-R, DVD-R, DVD+R)
• Multiple write versions also available (CD-RW, DVD-RW, DVD+RW)
• Reads and writes are slower than with magnetic disk
• Juke-box : systems, with large numbers of removable disks, a few drives, and a mechanism for
automatic loading/unloading of disks available for storing large volumes of data.
TAPE STORAGE : used primarily for backup and archival data.
• Cheaper, but much slower access, since tape must be read sequentially from the beginning.
• Used as protection from disk failures!
• sequential-access – much slower than disk, very high capacity (40 to 300 GB tapes available), tape
can be removed from drive ⇒ storage costs much cheaper than disk, but drives are expensive.
• Tape jukeboxes available for storing massive amounts of data, hundreds of terabytes
9 12
(1 terabyte = 10 bytes) to even a petabyte (1 petabyte = 10 bytes)
• A read-write head moves to the track that contains the block to be transferred. Disk rotation moves the
block under the read-write
write head for reading or writing.
• A physical disk block (hardware) address consists of a cylinder number (imaginary collection of tracks of
same radius from all recoreded surfaces), the track number or surface number (within the cylinder),
and block number (within track).
• Reading or writing a disk block is time consuming because of the seek time ‘s’ and
rotational delay (latency) ‘rd’.
• Double buffering can be used to speed up the transfer of contiguous disk blocks.
OR
1. The storage capacity of a single disk ranges from 10MB to 10GB. A typical commercial database may
require hundreds of disks.
2. Figure given above shows a moving-head disk mechanism.
Read-write head: positioned very close to the platter surface (almost touching it), Reads or writes
magnetically coded information.
o Each disk platter has a flat circular shape. Its two surfaces are covered with a magnetic material and
information is recorded on the surfaces. The platter of hard disks are made from rigid metal or
glass, while floppy disks are made from flexible material.
o The disk surface is logically divided into tracks, which are subdivided into sectors. A sector
(varying from 32 bytes to 4096 bytes, usually 512 bytes) is the smallest unit of information that can
be read from or written to disk. There are 4-32 sectors per track and 20-1500 tracks per disk
surface. Typically 200 (on inner tracks) to 400 (outer tracks) sectors per track.
o The arm can be positioned over any one of the tracks.
o The platter is spun at high speed.
o To read information, the arm is positioned over the correct track.
o When the data to be accessed passes under the head, the read or write operation is performed.
3. A disk typically contains multiple platters (see Figure). The read-write heads of all the tracks are mounted
on a single assembly called a disk arm, and move together. To read/write a sector:: disk arm swings to
position head on the right track, platter spins continually: data is read/written as sector passes under head.
Head-disk assemblies:: multiple disk platters on a single spindle (typically 2 to 4), one head per platter,
mounted on a common arm
o Multiple disk arms are moved as a unit by the actuator.
o Each arm has two heads, to read disks above and below it.
o The set of tracks over which the heads are located forms a cylinder.
th
Cylinder ‘i’ consists of i track of all the platters
o This cylinder holds that data that is accessible within the disk latency time.
o It is clearly sensible to store related data in the same or adjacent cylinders.
4. Disk platters range from 1.8" to 14" in diameter, and 5"1/4 and 3"1/2 disks dominate due to the lower cost
and faster seek time than do larger disks, yet they provide high storage capacity.
5. A disk controller interfaces between the computer system and the actual hardware of the disk drive. It
accepts commands to r/w a sector, and initiate actions such as moving the disk arm to the right track and
actually reading or writing the data. Disk controllers also attach checksums to each sector to verify that data
is read back correctly(check read error). Ensures successful writing by reading back sector after writing it.
6. Remapping of bad sectors: If a controller detects that a sector is damaged when the disk is initially
formatted, or when an attempt is made to write the sector, it can logically map the sector to a different
physical location.
7. SCSI (Small Computer System Interconnect) is commonly used to connect disks to PCs and workstations.
Mainframe and server systems usually have a faster and more expensive bus to connect to the disks.
8. Head crash: why cause the entire disk failing (?).
9. A fixed dead disk has a separate head for each track -- very many heads, very expensive. Multiple disk
arms: allow more than one track to be accessed at a time. Both were used in high performance mainframe
systems but are relatively rare today.
Disk Subsystem:
METHODS:
• Disk-arm Scheduling: requests for several blocks may be speeded up by requesting them in the order they
will pass under the head.
– If the blocks are on different cylinders, it is advantageous to ask for them in an order that minimizes
disk-arm movement
– Elevator algorithm -- move the disk arm in one direction until all requests from that direction are
satisfied, then reverse and repeat
– Sequential access is 1-2 orders of magnitude faster; random access is about 2 orders of magnitude
slower.
• Non-volatile write buffers
– store written data in a RAM buffer rather than on disk
– write the buffer whenever it becomes full or when no other disk requests are pending
– buffer must be non-volatile to protect from power failure
• called non-volatile random-access memory (NV-RAM)
• typically implemented with battery-backed-up RAM
– dramatic speedup on writes; with a reasonable-sized buffer write latency essentially disappears
• File organization (Clustering): reduce access time by organizing blocks on disk in a way that corresponds
closely to the way we expect them to be accessed
– sequential files should be kept organized sequentially
– hierarchical files should be organized with mothers next to daughters
– for joining tables (relations) put the joining tuples next to each other
– over time fragmentation can become an issue
• restoration of disk structure (copy and rewrite, reordered) controls fragmentation
Tertiary Storage
Optical Disks:
1. CD-ROM has become a popular medium for distributing software, multimedia data, and other electronic
published information.
2. Capacity of CD-ROM: 700 MB. Disks are cheap to mass produce and also drives.
3. CD-ROM: much longer seek time (250m-sec), lower rotation speed (400 rpm), leading to lower
data-transfer rate (about 150 KB/sec). Drives spins at audio CD spin speed (standard) is available.
4. Recently, a new optical format, digit video disk (DVD) has become standard. These disks hold between 4.7
and 17 GB data.
5. WORM (write-once, read many) disks are popular for archival storage of data since they have a high
capacity (about 700 MB), longer life time than HD, and can be removed from drive – its good (hard to
tamper(alter,corrupt)).
Magnetic Tapes:
1. Long history, slow, and limited to sequential access, and thus are used for backup, storage for infrequent
access, and off-line medium for system transfer.
2. Moving to the correct spot may take minutes, but once positioned, tape drives can write data at density and
speed approaching to those of disk drives.
3. 8mm tape drive has the highest density, and we store 5 GB data on a 350-foot tape.
4. Popularly used for storage of large volumes of data, such as video, image, or remote sensing data.
Storage Access
1. Each file is partitioned into fixed-length storage units, called blocks, which are the units of both storage
allocation and data transfer.
2. It is desirable to keep as many blocks as possible in main memory. Usually, we cannot keep all blocks in
main memory, so we need to manage the allocation of available main memory space.
3. We need to use disk storage for the database, and to transfer blocks of data between main memory and disk.
We also want to minimize the number of such transfers, as they are time-consuming.
4. The buffer is the part of main memory available for storage of copies of disk blocks.
5. Buffer manager - subsystem responsible for allocating buffer space in main memory and handling block
transfer between buffer and disk.
Buffer Manager
• The buffer pool is the part of the main memory alocated for temporarily storing disk blocks read
from disk and made available to the CPU.
• The buffer manager is the subsystem responsible for the allocation and the management of the buffer space
(transparent to users)
• On a process (user) request for a block (page) the buffer manager:
– checks to see if the page is already in the buffer pool
– if so, passes the address to the process
– if not, it loads the page from disk and then passes the address to the process
– loading a page might require clearing (writing out) a page to make space
• The buffer manager handles all requests for blocks of the database.
Buffer replacement policies
The buffer manager must use some sophisticated techniques in order to provide good service:
– Replacement Strategy -- When there is no room left in the buffer, some block must be
removed to make way for the new one. Typical operating system memory management
schemes use a “least recently used'' (LRU) method. (Simply remove the block that was referenced
least recently is written back to the disk & is removed from the buffer.) This can be improved
upon for database applications.
– Pinned Blocks - For the database to be able to recover from crashes, we need to restrict
times when a block maybe written back to disk. A block not allowed to be written is said to be pinned.
Many operating systems do not provide support for pinned blocks, and such a feature is essential if a
database is to be “crash resistant(unwilling)”.
– Forced Output of Blocks - Sometimes it is necessary to write a block back to disk even though its
buffer space is not needed, (called the forced output of a block.) This is due to the fact that main
memory contents (and thus the buffer) are lost in a crash, while disk data usually survives.
– Toss-immediate strategy: free the space occupied by a block as soon as the final tuple of that
block has been processed. After the final tuple of that block has been processed the block is unpinned
and becomes the most recently used block. This is essentially “toss-immediate” with pinning, and works
very well with joins.
File Organization:
• It would then require two block accesses to read or write such a record.
Record 0 Smita Beriwal Operating System 63
Record 1 Nidhi Sharma Computer Network 70
Record 2 Amit Gupta Java 75
Record 3 Sneha Abhyankar Digital Elex 69
Record 4 Gautam Shah DBMS 72
2. When a record is deleted, we could move all successive records up one , which may require moving a lot
of records.
• We could instead move the last record into the ``hole'' created by the deleted record
• This changes the order the records are in.
• It turns out to be undesirable to move records to occupy freed space, as moving
requires block accesses.
• Also, insertions tend to be more frequent than deletions.
• It is acceptable to leave the space open and wait for a subsequent insertion.
• This leads to a need for additional structure in our file design.
Record 0 Smita Beriwal Operating System 63
Record 2 Amit Gupta Java 75
Record 3 Sneha Abhyankar Digital Elex 69
Record 4 Gautam Shah DBMS 72
Header
Record 0 Smita Beriwal Operating System 63
4. Note: Use of pointers requires careful programming. If a record pointed to is moved or deleted, and that
pointer is not corrected, the pointer becomes a dangling pointer. Records pointed to are called pinned.
5. Fixed-length file insertions and deletions are relatively simple because ``one size fits all''. For variable
length, this is not the case.
Variable Length Records
1. Variable-length records arise in a database in several ways:
• Storage of multiple items in a file.
• Record types allowing variable field size
• Record types allowing repeating fields.
Byte String Representation
2. It is difficult to maintain physical sequential order as records are inserted and deleted.
• Deletion can be managed with the pointer chains.
• Insertion poses problems if no space where new record should go.
• If there is a free space left after deletion within the same block, insert the new record there.
Else put new record in an(from) overflow block.In either case,adjust the pointers so as to chain
together the records in search key order.i.e. Adjust pointers accordingly.