01 Disk

File Organization
(Dosya Dzenleme)
Ders Notlar #1
(Disk Organization & Performance)
Alim-i mursid, koyun olmal; ku olmamal.
Koyun, kuzusuna st; ku yavrusuna kay verir.
Textbook & References

File Structures, Michael J. Folk, Bill Zoellick, Greg Riccardi,
An OO Approach with C++, Addison Wesley,1998

Database Design &Implementation, Edward Sciore, John
Wiley, 2009
Definitions-1
File Structure is a combination of representations for data in files and of operations
for accessing the data on disk.

Data structures: deal with data in the main memory
File structures: deal with the data in the secondary storage
Main operations on file structure : search, add, remove, update, sort (external
sorting) , merge
Main Metrics: simplicity, complexity, scalability, programmability, and
maintainability
Structures for static files vs. dynamic files differs a lot.
Structures differ according to the media as well.
A rough History Access Methods:
Sequential Search
simple index search (before 1960 )
tree structures
( BST-1960s, AVL-1963, B-tree (B+-tree) 1970s
Simple Hashing (before 1980)
Dynamic Hashing (after 1980)
Based on
usage characteristics of data
data type (basic vs. multi-dim data)
physical characteristics of machine
Definitions-2
Physical File is the particular collections of bytes stored in disk.
Logical file is the view of physical file from the standpoint of application
program.
There are thousands of physical files on disk, but the program can have
only 20 logical files.
OS make the connections between physical file and logical files.
Example:
int fd = open (filename, flags[,mode]);
Read/Write: First OS make the connection then read/write with using
logical descriptor.
Physical Devices as Files. The program access the file without knowing
whether the file comes from disk, tape, another computer,
keyboard(stdin), screen (stdout)...
% list.exe > myoutput
% prog1 | prog2
% list | sort
MAIN GOAL
Increase reliability while increasing the speed at the lowest cost
Data is inevitably scattered over disk pages in real world.

To decrese the speed, we have to minimize the number of access by
clustering (temporal & locality) as far as possible

Getting e/t you need at one acess.
The physical characteristics of the hardware together with data
structures and the algorithms are used to predict the efficiency of file
operations.
Now we will study physical characteristics of the hardware...
Storage Hierarchy
Machine
instructions
Cost/unit increase
Cache
1M, 15 cycle
(~40 times expensive)
Main Memory 1G-16GB, 200 cycle (10 nanosec.)

volatile media
Capacity decrease
100.000
times slower
USB Flash Storage
Reliability increase
Non-volatile media
Secondary Storage
Non-volatile media
(like 10sec
vs. 10days)
80G, 10 milisec.
CD-RW
DVD-RW
Floppy Disk
Serial access
Magnetic Tape
storage
technologies used in storage

Cache memory: static RAM (SRAM)
Primary memory: dynamic RAM (DRAM)
Secondary memory:
HDD (Hard Disk Drive): magnetically store and R/W by magnetic arm
SSD (Solid state Disk): electronic disk
CD_ROM (Compact Disk-Read Only Memory): store optically, read by laser

WORM (Write Once Read Many)
DVD (Digital Video Disk): store optically, read by laser. (smaller wavelength and laser
type)
BD (Blue-Ray Disk): uses blue-ray technology
technologies used in storage

USB Flash memory: EEPROM (NAND-type)
100 times faster than HDD, 100 times expensive than HDD
Wears out! Wear-level technique to lessen the rewrite limit problem.
Magnetic Tapes:
cheap storage for archieval purposes
Only Sequential acces
Charactericstics of 9-track tape
Tape density: bpi = Bpi (6250 ~ 30.000)
Tape speed: ips (30 ~ 200)
Gap size: 0,3 ~ 0,75 inch (i)
frame
track
9-track
tape
Data block
8
Gap (R/W durma/kalkma iin braklan alan)
Tape length
How much tape is needed to store 1 million 100-B records if we have tape
with 6250 bpi and 0,3 inch-size gaps?

g: length of gap size
n: # of data blocks
b: length of data block
s= n * (b+g)
Blocking factor (bf) = # of records / block
bf=1b=100/6250 Bpi = 0,016 i s= 1 million * (0,3+0,016) =26.333 feet
bf=50b=100*50/6250 Bpi = 0,8 i s= (1 million/50) * (0,3+0,8) = 1833 feet
Effective recording density (erd)
=#of Bytes /data_block /length required to store data block
bf=1 100 /0,316 = 316,4 bpi

bf=50 100*50 / 1,1 = 4545,45 bpi
9
Tape data trasmission rate

Nominal trans. rate = Tape density (bpi) * Tape speed (ips)
Effective trans. rate = effective recording density (erd) * Tape
speed (ips)
If we have 200 ips tape, determine the nominal and effective
trans rates?
Nominal trans. rate= 6250 Bpi * 200 ips= 1250 KBps
bfr = 1 effective trans. rate = 316,4 Bpi * 200 = 63,3 KBps
bfr = 50 effective trans. rate = 4545,45 bpi * 200 = 909 KBps
10
Basic Organization of DISK (hard or floopy):
Since there is only 1 datapath to the

computer, only 1 read/write head can
be active at a time.
11
Disk Organizations: Sector,Cluster

It has to do with abstraction. Improves disk access by decreasing ts.
Sector the fix-length, smallest addressable portion of a disk.
Typically 512 bytes- 4K.
Cylinder-Head-Sector addressing (physical CHS addressing) to access the
sector, then bring it into memory (buffer)

OS does not use CHS addressing, rather it uses LBS (logical block
addresing) which orders all sectors from 0 to the last sector. If needed, a
firmware (bios) on disk converts LBS adress to CHS address
Cluster is the smallest unit of space that can be allocated to a file by
OS. OS views the file as a series of clusters. A cluster has a fix

number of contiguous sectors.
FAT in File Manager (a module in OS) ties physical sectors to the logical
clusters by using FAT.
(A) track
(B) geometrical sector
(C) track sector
(D) cluster
12
Disk performance metrics-1

Note: B = byte
1KB=1024B (210B), 1MB=1024KB (220B), 1GB=1024MB (230B)

Capacity= C
seek time (ts), rotation speed (tr), transfer rate (tt)
C = (# of platters) * (# of tracks/platter) * (# of B/track) example: 80GB,160GB
ts = the time it takes for the actuator to move the disk head from its
current location to requested track, ex: tsmin=0, tsmax=15-20msec,
tsave.= (1/3)*tsmax= 5ms. The slowest part of total cost.
tr = the time spent to move the head over the requested sector.
Ex: 10000rpm 6msec is the full rotation time. in average ~1/2 of
full rotation,
tt = (# of bytes transfered / # of bytes on a track) * rotation time
the speed at which bytes pass by the disk head, to be transfered
to/from memory.
TOTAL ACCESS TIME= ts+tr+tt
transfer_rate: B/msec (Sample value= 100MB/sec)
13
Example1
10.000 rotateperminute disk
Bytes / sector= 512
Sectors / track = 170
Tracks / cylinder = 16
Ave. Seek time= 8 msec
Ave. Rotational delay= 3 msec
transfer rate= 1/6 * (512*170)= 14500 bytes / msec
Transfer time for a single sector? = (6/170)msec
Example 2: in real disk we transfer at least a sector, why?

10.000rpm
tsave= 5 msec
Transfer rate=83MB/sec
Transfer time for 1B= 1/83MB = 0,000012 msec
Transfer time for 1MB = 0,012 msec
Average access to 1 B= 5 + 3 + 0,000012 = 8,000012 msec
Average access to 1 MB = 5 + 3 +0,012 msec= 8,012 msec
That is why we transfer a sector for each access !...
14
Internal / external fragmentation

fragmentation means that something is broken
into parts that are detached, isolated or incomplete.

May occur at al level of organization. Sector, block,
cluster, file...
Since the sector size is fix, usually there is no
convenient fit between records and sectors. This
leads to internal fragmentation within the
sectors. Think similiar unused holes at different
levels..
Sector spaning is a simple solution for this
problem. Disadvantage?
aaaaa
aa---
bbbbb
aaaaa
aabbb
bbccc
ccc--
Disk may have lots of small-sized chunks of unallocated blocks, but no large
chunks. Thus it may not be possible to allocate space for a large file, even though
disk has plenty of free space. This is called external fragmentation.
15
Block-level interface(block,page )
Block is a sequence of bytes.
Adv.:
OS hides hardware details (like different sector sizes, different addressing) by block-level interface. OS
maintains mapping b/w blocks and sectors.
Blocking increase througput (successful data transfer rate).
A Block size is at least 1 sector-size and determined by OS.. OS views the disk as a series of blocks.
Block numbers start from 0.

BF = # of records stored in each block.
While Page is a block-sized area allocated in main memory.
OS provides several methods to acces disk blocks.
readblock(n,p): read data from block-n to page-p
writeblock(n,p): write data in page-p to block-n.
allocate(k,n): find k contigious available blocks as close to block-n as possible.
deallocate(k,n): deallocate k contigious blocks as close to block-n as possible.
OS keeps track of which blocks are available for allocation. 2 basic strategies exists:
Disk map
Free list
16
Block size:
Block contention increase with larger blocks. Thus, OLAP applications/web
search applications, which has higher random access prefers small-sized block.
Desicion support/data warehouse applciations, which has higher sequential access
prefers large-sized block.
17
In terms of
Prefered size
Application
Block contention
Small
OLAP
Random row
access speed
Small
OLAP
Sequential row
access speed
Large
Desicion support
Data warehouse
Disk performance metrics-2 (Additional

metrics)
Block transfer time (btt)= (B/ transfer_rate)
bulk_transfer_rate (btr) = rate of transfering useful
bytes in the blocks (kullanl veri transfer hz)

btr = B/(B+G) * transfer_rate
(G is gap size)
Bulk time to transfer of k consecutive blocks on the same
cylinder?
ts + tr + (k* (B/bulk_transfer_rate))
18
File-level interface
Client views file a sequence of bytes. (No notion of block,
sector..) . Client can directly access to a byte in the file..

OS hides the details from client. For example, in the below code, blocks
are accessed through pages. I/O buffers are allocated....
How many disk access requires for
f.read() and f.write()?
Look at journey of a byte at the
following slides
f.seek() method performs 2 conversions:

specified byte position logical block reference (simple)
Logical block reference physical block reference
(depends on file system implementation)

19
File implementation strategies

Contiguous Allocation: each file as a sequence of contiguous blocks. Simplest strategy. Both
internal and external fragmentation
Extent-based allocation: similiar to contiguous allocation. Reduces internal/external
frag. by storing a file a sequence of fix-length extents. File is extended 1 extent at a
time
Indexed allocation: extend file 1 block at a time. Least possible amount of
fragmentation. Keeps track of allocated blocks of the file with a special index block.
We need multiple level of indexing for large files Ex: UNIX file system
20
File implementation strategies
Extent-size: 8 blocks
Junk s 21th logical block 53.block (clustering)

701.block (extent)
Junk s 2 th logical block 16.block (indexing)

21
Example #3 (effect of sector spanning )

512 bytes per sector
63 sectors per track
16 tracks per cylinder
4092 cylinders
Disk capacity? 512*63*16*4092
We have a file with 50.000 fixed-length records. How many cylinders does
the file requires if each data record requires 240 byte?
In case of sector spaning:
Cylinder capacity = 512*63*16 = 516,096 bytes
File size = 50,000*240 = 12,000,000 bytes
Number of cylinders required = 12,000,000/516,096 = 23.25
In case of Internal fragmentation: (no sector spanning)
File requires 25.000 sectors
63*16 = 1008 sectors per cylinder
25,000/1008 = 24.8 cylinders required.
Analysis:
Sector spanning has an adv., because it requires less space for the file.
On the other, some records can be retrieved by accesing two sectors.This is the disadvantage.
Observe fragmentation problem at different levels(like cluster..)
22
Example #4 ( file access performance)

10.000 rotateperminute disk, 512 B/sector, 170 Sectors / track, 16 Tracks / cylinder,
tsave = 8 msec. Allocate a file having 34.000 records, each of which is 256-B-length.
a.) with cluster size of 4 KB.
b.) with a cluster size of 1track.
34.000*256B
8704 KB file
Track-based
access
100 tracks (not
contigious)
cluster-based
access
256-byte
34.000
records
.
.
Tracksize:170*512
byte
Cluster size=4096 B
This is 4096 / (170*512) of track
There are 2125 clusters
Track-based
sequential
access=
(8msec+3msec+6
msec)*100=1.7sec
cluster-based random access=

(8msec+3msec+(1/21.25) *6msec)*2125
=23,97sec
23
Example #5 (random sector access time)

# of platters: 4
8192 track/platter-surface
256 sectors / track
512 bytes / sector
disk ap: 3.5 inches (1 inche: 2,54cm)

Gaps take %10 of the track space.
Rpm: 3840
The head takes 1 msec for every 500 cylinder plus 1
msec for start/stop.
What is the best, worst and average random sector I/O=?
min. = 0.05 msec

max. = 33.05 msec
ave. = 14.65 msec
If a block is 4096 bytes, what is the best, worst and average random
I/O=?
min. I/O = transfer time = 0.5 msec
24
Example#6 (file access performance)

# of platters: 4
8192 track/platter-surface
256 sectors / track
512 bytes / sector
disk ap: 3.5 inches (1 inche: 2,54cm)

Gaps take %10 of the track space.
Rpm: 3840
The head takes 1 msec for every 500 cylinder plus 1 msec
for start/stop.
How much does it take to read 1 MB of data which is all

stored in consecutive tracks?
Answer: 138.8 msec
If all sectors are scattered on the disk
30.003,2 msec ( minutes)
How much does it take to read 1000 MB of data which is all
stored in consecutive cylinders?
Answer: 126.005,8 ( 2 minutes and 6 sec)
If all sectors are scattered on the disk
30.003.200 msec ( 8 hours +20 minutes)
25
Example #6-devam (effect of disk access algorithms)

Requested
cylinder
Arraival time
Complete time
(by fifo algoritm)
Complete time
1000
7.85
7.85
(1.)
3000
7000
2000
8000
5000
0
0
20
30
40
20.7
37.55
56.6
77.45
92.3
20.7
37.55
77.5
48.4
63.25
(2.)
(3.)
(6.)
(4.)
(5.)
(by elevator algoritm)
26
A journey of Byte
Write(append) the 1-B value in ch in
program to textfile
write (textfile, ch, 1)

A system call to OS
File Manager handles the request

logical
FM access the information of the textfile

i.e about the physical location(cylinder,track..) of the file..
FM uses FAT to locate the location
of sector that is to contain the byte
FM finds an available I/O buffer space then read
the sector from disk into the system buffer in MM.
physical
Then write ch into the appropriate

place in the sector in MM.
FM give instruction to I/O processor where the byte is stored
in the MM and where it need to be sent in the disk.
I/O processor check if the drive is available and
also buffers the chunks of proper size of disk.
I/O processor sends data to disk cotroller

controller instructs the drive to move the arm to the
proper track and wait until the proper sector come
under the arm and then sends the sector bit-by-bit.
Do not send the sector immediately to

disk.Why?
In which case do we need to send
immediately?
27
I/O Processor / direct memory acces controller
I/O processor: handles the task of communicating disk, process
independently from the main cpu.

I/O processor (a special purpose device) take the
commands from OS and communicates with disk
controller.
Once the buffer is full, I/O processor send the sectors bytes,
one at a time, as soon as the controller is available.
User prog.
char c in
data area
1)
28
File Mgr. in OS:

char c in
system buffer
I/O processor/
DMA controller
Disk
controller
DISK
2)
1-) Move mode / locate mode: to eliminate data transfer OH.
2-) Scatter input / gather output (vectored I/O): to eliminate 2step process to scatter OH and useful data of block.
Disk Controller
DISK CONTROLLER CONTROLS THE DISK while hiding the details of disk access.
Disk controller is an interface b/w computer and disk-drive. Transfers R-W request/from disk,
controlling disk arm, provides reliability by applying checksums to sectors, remap the bad sectors.
Disk controller moves the head to the correct position, correct track, correct sector for reading
and writing.
ATA(advanced tech. attachment)=IDE(integrated drive electronics)
EX: 133 MB/s with ATA/133, 150 MB/S with SATA-serialATA, ATAPI
2 IDE port on PC, each port can atmost access 2 disk, one master and the other is slave.
SCSI (small computer system interface) a system bus standart coordinating many type of
devices on a single bus. Provides a basement for RAID disks.
Ex:max 16 devices in ULtra 320 SCSI.
IDE
SCSI
Cost
Cheap
Expensive
#of devices
16
Maintainance
Easy
Hard
Usage
At home
Business
Speed
133MB/s
320 MB/s
29
Disk Bottleneck, Improving Disk Access Time

CPU rate (high performance network) is dramatically higher
than Disk I/O transfer rate This causes disk bottleneck.

Solution: (read 3.1.8)
Multiprogramming (cpu works on other jobs while waiting for the data
to arrive)
Cylinders (2 tracks nearby ? at the same cylinders)

Disk cache
Parallelism (example: Disk Striping, RAID): This helps to achieve
better reliability as well.
Efficient use of RAM(buffering)
30
Disk Cache
Disk cache A kind of buffering! Block of memory set aside to
contain blocks of data from disk. Disk cache is bundled with disk
drive.
Improves performance.
When data is requested from secondary storage, the file manager looks
into the disk cache to see if it contains the requested data.
Compare the following access times:
transfer a sector= ts +1/2 tr + sector rotation time
transfer a track = ts + tr
Which one do you prefer? Transfer sector or transfer track?

The real value of disk cache is prefetching.
31
Disk Striping
Two 20GB drives are always faster than a single 40GB drive. Because
simultaneous access to 2 sectors. 2 problems arises:

Cost increases
Load balancing is required for disks to be working as uniformly as possible.
Using Two 20GB drives is efficient if both can be always kept busy.
To increase efficiency, we have to balance the workload among the multiple
disks..For balancing workload, Disk Striping can be used.

Disk striping uses the disk controller to hide the smaller disks from the OS, giving it
the illusion of large single disk.
32
Striping distributes database among

small drives equally.
Disk reliability, improving disk reliability

2 reasons to decrase reliability:
Magnetic material can degenerate
Head crash
2 approaches to increase reliability. These tasks are governed by disk controller again.
Mirroring
Storing Parity (use a single disk to back up any # of other disks)
33
High speed and robust

Problem:
Cost is high because High number of
disks..
2 problems:
Note that thare are 4 disk accesess
for a single sector write.
More vulnerable to non-recoverable
multi-disk failure
RAID
In high speed networks, Storage area Network (SAN) provides RAID
(redundant array of independent disks) organization. RAID

supports large data, provides reliability, resource sharing, performance
improvement, disk striping.
RAID-0 : only striping, no guard against disk failure
RAID-1: mirrored striping
RAID-2: byte striping, error-correcting codes instead of parity (difficult
to implement, no longer used)
RAID-3: byte striping and 1 parity disk
RAID-4: sector striping and 1 parity disk.
RAID-5: similiar to RAID-4, but parity information is distributed
among disks.(Nth sector of each disk stores parity info.)
RAID-6: similiar to RAID-5, store 2 kinds of parity info., thus needs
another disk for additional parity info.
34
Buffer Management
Working with large chunks of data in MM so that
Read data in memory multiple times (caching)
the # of access to disk is reduced..
Sysem buffer vs. user buffer
Buffer manegement by OS: organizing >2 buffers including system buffers..
coordination uses some techniques such as Least Recently Used, FIFO, clock-replacement
algorithm..
How many buffers do we need?
1: Even if we ( the program) transmit data in only one direction, 1 buffer causes
problems like I/O bound processing..(CPU wants to be filling the buffer at the
same time that I/O is being performed= Enabling I/O-CPU overlapping ONLY
by using at least 2 buffers!! Fig.3.22 )
2: At least 2 (one for input, the other for output): still similiar problems occur.
Solution : Apply Multiple buffering strategy :Tradeoff: (as cost of memory decrease using
many buffers is possible) the more buffers there are but the more complex management
transfer the buffer to the disk with 1
is required.
access when either
Buffer pool with 4 buffers (pages)

1 buffer for each page
35
The page is being replaced

File is closed
For data integrity purposes (Recovery
management)

01 Disk

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

01 Disk

Загружено:

Авторское право:

Доступные форматы

File Organization

Textbook & References

An OO Approach with C++, Addison Wesley,1998

for accessing the data on disk.

Data is inevitably scattered over disk pages in real world.

clustering (temporal & locality) as far as possible

(~40 times expensive)

Main Memory 1G-16GB, 200 cycle (10 nanosec.)

USB Flash Storage

technologies used in storage

SSD (Solid state Disk): electronic disk

CD_ROM (Compact Disk-Read Only Memory): store optically, read by laser

BD (Blue-Ray Disk): uses blue-ray technology

technologies used in storage

with 6250 bpi and 0,3 inch-size gaps?

bf=1 100 /0,316 = 316,4 bpi

Tape data trasmission rate

Basic Organization of DISK (hard or floopy):

Since there is only 1 datapath to the

Disk Organizations: Sector,Cluster

Typically 512 bytes- 4K.

Cylinder-Head-Sector addressing (physical CHS addressing) to access the

sector, then bring it into memory (buffer)

OS. OS views the file as a series of clusters. A cluster has a fix

Disk performance metrics-1

1KB=1024B (210B), 1MB=1024KB (220B), 1GB=1024MB (230B)

Example 2: in real disk we transfer at least a sector, why?

Internal / external fragmentation

into parts that are detached, isolated or incomplete.

Block numbers start from 0.

Disk performance metrics-2 (Additional

bytes in the blocks (kullanl veri transfer hz)

sector..) . Client can directly access to a byte in the file..

f.seek() method performs 2 conversions:

(depends on file system implementation)

File implementation strategies

File implementation strategies

Junk s 21th logical block 53.block (clustering)

Junk s 2 th logical block 16.block (indexing)

Example #3 (effect of sector spanning )

Example #4 ( file access performance)

cluster-based random access=

Example #5 (random sector access time)

disk ap: 3.5 inches (1 inche: 2,54cm)

What is the best, worst and average random sector I/O=?

min. = 0.05 msec

Example#6 (file access performance)

disk ap: 3.5 inches (1 inche: 2,54cm)

How much does it take to read 1 MB of data which is all

Example #6-devam (effect of disk access algorithms)

(by elevator algoritm)

write (textfile, ch, 1)

File Manager handles the request

FM access the information of the textfile

Then write ch into the appropriate

I/O processor sends data to disk cotroller

Do not send the sector immediately to

I/O Processor / direct memory acces controller

I/O processor: handles the task of communicating disk, process

independently from the main cpu.

File Mgr. in OS:

Disk Bottleneck, Improving Disk Access Time

than Disk I/O transfer rate This causes disk bottleneck.