Вы находитесь на странице: 1из 22

Application of B & B+

Tree in Storage
Allocation
Introduction
 As we have seen already, database consists of tables,
views, index, procedures, functions etc.
 The tables and views are logical form of viewing the data.
But the actual data are stored in the physical memory .
 Database is a very huge storage mechanism and it will have
lots of data and hence it will be in physical storage devices
– like magnetic disk .
 In the physical memory devices, these data cannot be
stored as it is. They are converted to binary format.
 Each memory devices will have many data blocks, each of
which will be capable of storing certain amount of data.
 The data and these blocks will be mapped to store the data
in the memory.
Overview of a Secondary
Storage-Magnetic Disk
Structure of Magnetic Disk
The primary medium for the long term online storage of data is
the magnetic disk . Physically, disks are relatively simple.
 Each disk platter has a flat , circular shape.
 Its two surfaces are covered with magnetic material.
 Information is recorded on the surfaces.
 When in use, a drive motor spins it at a constant high speed
(90, 120, 250 revolutions per second).
 There is a read-write head positioned just above the surfaces
of the platter.
 The disk surface is logically divided into tracks.
 Tracks are subdivided into sectors.
 A sector is the smallest unit of information that can be read
from or written to the disk.
Access Data in Magnetic Disk
 Traditional HDD has rotating drives
which stores data in tracks.
 When the data needs to be read or
written, the actuator with an arm,
needs to go to the particular sector
on the track to read or write a data.
This is measured as seek time.
 After that, the drive needs to rotate
to reach to a particular sector
(rotational latency).
 When we are dealing with huge
amount of data, it might become a
bottleneck since disk has to
continuously move to a specific
sector.
 Average seek time vary from 4ms for
high end servers and 9ms for
common server.
Motivation

 We assume that everything in a search tree is kept within


the main memory (including the balanced trees like AVL,
red-black trees, splay trees, etc.).
 What if the data items contained in a search tree do not
fit into the main memory?
 Just think about searching in the UIDAI database (for
AADHAAR details).
 Let us assume there is only 8 Bytes of data (say the
AADHAAR ID) per citizen and we have to create a search
tree.
 The population of India: 1,358,856,931 (LIVE!!!).
 The search tree will require more than 20 GB memory
(including pointers)!!!
Search Tree on disk

 A majority of the tree operations (search, insert,


delete, etc.) will require O(log2 n) disk accesses
where n is the number of data items in the
search tree.
 The main challenge is to reduce the number of
disk accesses.
 An m-ary search tree allows m-way branching.
 As branching increases, the depth decreases.
 A complete binary tree has a height of ┌ log2 n
┐.
 But a complete m-ary tree has a height of ┌ logm
n ┐.
Cycles to access different
types of storage
Storage Type Access Type Number of Cycles

CPU registers Random 1

L2 cache Random 2

L2 cache Random 30

Main Memory Random 2.5 X 10^2

Hard Disk Random 3 X 10^7

Steam Line 5 X 10^3


Characteristics of B Tree

 B-Tree is a low-depth self-balancing tree.


 The height of a B-Tree is kept low by
putting maximum possible keys in a B-
Tree node.
 Generally, the node size of a B-Tree is
kept equal to the disk block size.
What is B Tree

 Definition:-
A B-Tree of order m is an m-ary tree with the
following properties:
 The data items are stored at leaves.
 The non-leaf nodes store up to m − 1 keys to guide
the searching; The key i represents the smallest key
in subtree i + 1.
 The root is either a leaf or has between 2 and m
children.
 All non-leaf nodes (except the root) have between
┌m/2┐ and m children.
 All leaves are at the same depth and have between
┌ k/2 ┐ and k data items, for some k.
Searching
Insert 56 into tree
Delete
Delete
B+ Tree

 B+-trees are an important variant of B-trees.


 The performance of a B-tree depends heavily on
the height of the tree.
 The deeper a tree, the more page lookups (on
secondary storage) we need to reach a leaf.
 So what can we do to “flatten” B-trees?
B+ Tree

 If we can increase the branching (number of


pointers) in inner nodes, then the tree will
become “flatter”.
 Instead of storing data in inner nodes, we only
store search keys (take up less space ⇒ more room
for pointers).
 We also link all the leaf nodes, allowing a fast
sequential search.
Schema of B+ Tree
B+ Tree

 Definition:
A B+-Tree of order m is an m-ary tree with the
following properties:
 The data items are stored at leaves.
 The non-leaf nodes store up to m − 1 keys to guide
the searching; The key i represents the smallest
key in subtree i + 1.
 The root is either a leaf or has between 2 and m
children.
 All leaves are at the same depth and have up to k
data items, for some k.
Example
Advantages
 Since all records are stored only in the leaf node and are
sorted sequential linked list, searching is becomes very
easy.
 Using B+, we can retrieve range retrieval or partial
retrieval. Traversing through the tree structure makes
this easier and quicker.
 As the number of record increases/decreases, B+ tree
structure grows/shrinks. There is no restriction on B+
tree size, like we have in ISAM.
 Since it is a balance tree structure, any insert/ delete/
update does not affect the performance.
 Since we have all the data stored in the leaf nodes and
more branching of internal nodes makes height of the
tree shorter. This reduces disk I/O. Hence it works well in
secondary storage devices.
Conclusion

 B+ tree are extensively used in both


database and file systems because of the
efficiency they provide to store and retrieve
data from external memory. Thus B+ trees
are a cost effective way to store data in
bulk.

Вам также может понравиться