Вы находитесь на странице: 1из 3

Tree Data Structure and Its Application

Summary
In this paper, we focus on trees and B-tree’s application in disc reading. Details, regarding to
definition and functions of trees, are explained in briefs. Moreover, trees, particularly B-tree, applied in
disc reading is explained in some of last paragraphs of this paper.

 Many basic data structures are applied to solve real life problems. Arrays are less complex to
implement. We can also access arrays randomly. Linked lists are dynamic and be useful in
applications require frequently update. We know these two data structures are useful to store data
in linear structures. However, both arrays and linked-lists have a drawback is time requirement to
find an item. Linked-lists, especially, have data access in sequential. There are other data
structures like stack and queues that can be used to solve much more complex problems. Also,
hash table is used to solve problems requires constantly updated. Time complexity for a hash
table can be only 1. (TK, 2018)
 To dig deeper into linked data structure, we expand the concept to trees, where their nodes have
multiple relations. A tree is a data structure with multiple nodes linked by directed or undirected
edges. Array, linked-list, stacks and queues are linear data structures. However, a tree is a
nonlinear data structure. A tree can include a node called root, and subtrees. A tree can have only
one node and no subtrees. Moreover, a tree can be empty with no nodes at all. (Comer, Douglas
1979)
 Properties of a tree include:
1. Starts with a node called root, a root is a parent node. Nodes connect to a root called child or
children.
2. Edges are links connecting nodes
3. A node without children called leave.
4. The longest path from root to a leave called height
5. The path from the root to a node called depth

For example, the tree in figure 1 has “50” as the root. Nodes are “50”, “40”, “70”, “30”, “45” and “60”.
The lines connecting those nodes are edges. This tree has 3 leaves, “30”, “45”, “60”. The height is 2 since
from “50” to one of “30”, “45”, “60” takes 2 edges. The depth from “50” to “40” is 1 because that path
has one edge.

Figure 1: A tree

 One of the most applicable of trees is B-tree where it is used for databases and file systems. It is
more generalized than binary tree in term of each node in B-tree can have more than two children.
Figure 2: A B-tree (Source: Wikipedia)

 A non-leaf node in B-tree can have many numbers of children (in the internal node) within a
range that is predefined, normally it has two or three children. Data can be removed within a
node. Number of children in that node changes but the range is not. Since the range is predefined,
it could cause waste of space. However, it doesn’t need to rebalance as frequent as other types of
trees.
 Keys play important role in B-tree. These keys separate the range of values that children in nodes
can have. For example, if a node has 3 internal nodes inside, then there exist 2 keys, says a1 and
a2. The values of internal nodes on the left side of a1 must be less than a1. Also, value within the
range a1 and a2 are bigger than a1, but less than a2. Values of internal nodes larger than a2 are on
the right side of a2.

Now is the time to mention the application of B-tree in disc reading. In database application, order
notation is deployed to perform number of comparison operations. The comparison complexity of a sorted
table is a ceiling function of log2 N. One billion records take 30 comparisons.

 Disc drives are ideal places to keep big databases. The issue is the time to read a record in a disc
exceeds the time the keys available. Time to read a record includes seek time and rotational delay.
20 milliseconds or more are regular seek time. The rotational delay is about half of rotation
period. For a Seagate disc 700RPM, the reading time is approximately 10 milliseconds. (Seagate,
2008)
 Reading a record on Seagate disc technically takes 0.2 seconds for 10 milliseconds per 1 disk
read time, and it has 20 disk read times. This amount of time is too big and unacceptable. Here
the B-tree plays an important role. A disk groups records to blocks, separates those records by
ordered keys. A block can store up to 100 records. Then, when reading disk, multiple blocks are
read with almost no delay. Each block is read in 10 milliseconds. Once the comparison is
completed the remained comparisons don’t need to do any disk reads. By applying B-tree in
reading disc technology, the access time to a record is shortened significantly.

Ideas and recommendations

Trees and B-tree have various applications. As mention above, B-tree plays an important role in
databases. However, B-tree could cause waste of resource since there exists the unfilled internal nodes.
To limit the waste, we can limit the number of internal nodes. Regarding to disc reading technology, to
speed up the reading time, the first few comparisons in total comparisons should be done in a fast fashion,
so in case of deploying the remained comparison, we can minimize the reading time.
References

Comer, Douglas (June 1979). "The Ubiquitous B-Tree". Computing Surveys. 11 (2): 123–137.
doi:10.1145/356770.356776.

Seagate Technology LLC, Product Manual: Barracuda ES.2 Serial ATA, Rev. F., publication 100468393,
2008, page 6

TK, L. (2017, November 05). Everything you need to know about tree data structures. Retrieved from
https://medium.freecodecamp.org/all-you-need-to-know-about-tree-data-structures-
bceacb85490c

Вам также может понравиться