DSC Handout PDF

Handout: Data Structures with C
Version: DSC/Handout/0307/2.1
Date: 05-03-07
Cognizant
500 Glen Pointe Center West
Teaneck, NJ 07666
Ph: 201-801-0233
www.cognizant.com
Data Structures with C
TABLE OF CONTENTS
Introduction ................................................................................................................................4
About this Document..................................................................................................................4
Target Audience.........................................................................................................................4
Objectives ..................................................................................................................................4
Pre-requisite ..............................................................................................................................4
Session 1: Introduction to Data Structure .................................................................................5

Learning Objectives ...................................................................................................................5
Overview....................................................................................................................................5
Summary ...................................................................................................................................9
Test your Understanding..........................................................................................................10
Session 2: Arrays ......................................................................................................................11

Learning Objectives .................................................................................................................11
Overview..................................................................................................................................11
Summary .................................................................................................................................20
Session 4: Linked Lists .............................................................................................................21

Linked lists ...............................................................................................................................21
Summary .................................................................................................................................32
Session 6: Sorting and Searching ............................................................................................33

Sorting .....................................................................................................................................33
Summary .................................................................................................................................43
Session 8: Trees ........................................................................................................................45

Overview: .................................................................................................................................45
Summary .................................................................................................................................56
Page 2
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Session 10: Balanced trees and hashing.................................................................................57

Overview: .................................................................................................................................57
Hashing....................................................................................................................................68
Summary .................................................................................................................................70
Session 11: Graphs ...................................................................................................................71

Graphs .....................................................................................................................................71
Summary .................................................................................................................................80
Glossary .....................................................................................................................................81
References .................................................................................................................................84
Websites ..................................................................................................................................84
Books.......................................................................................................................................84
STUDENT NOTES: .....................................................................................................................85
Page 3
C3: Protected
Introduction
About this Document

This module provides the participants with the basic knowledge to understand data structures
and to measure the performance of various algorithms used in different problems.
Target Audience
In-Campus Trainees
Objectives
Acquire the basic knowledge on data structures
Select the appropriate data structures for the application
Analyze the complexity of the algorithm
Apply data structures using data structures
Pre-requisite
The participants must have basic knowledge in writing programs using C.
Page 4
C3: Protected
Session 1: Introduction to Data Structure
Learning Objectives
After completing this chapter, you will be able to:
Define a data structure
List the types of data structures
Identify how to analyze and select data structure for a particular application
Overview
Study of computer science involves study of organization, manipulation and utilization of data in a
computer in order to improve the efficiency of the processor and memory.
Data type and data structure
Data can be represented in the form of binary digits in memory. A binary digit can be stored using
the basic unit of data called bit. A bit can represent either a zero or a one.
Data type
A data type defines the specification of a set of data and the characteristics for that data. Data type
is derived from the basic nature of data that are stored for processing rather from their
implementation.
Data Structure
Data structure refers to the actual implementation of the data type and offers a way of storing data
in an efficient manner. Any data structure is designed to organize data to suit a specific purpose so
that it can be accessed and worked in appropriate ways both effectively and efficiently. In
computer programming, a data structure may be selected or designed to store data for the
purpose of working on it by various algorithms.
The choice of a data structure begins from the choice of an abstract data type. Data structures are
implemented using the data types, references and operations on them that are provided by a
programming language.
Example data structures include:

Arrays
Stacks
Queues
Linked Lists
Page 5
C3: Protected
Abstract Data Types (ADT)

An Abstract Data Type (ADT) defines data together with the operations. ADT is specified
independently of any particular implementation. ADT depicts the basic nature or concept of the
data structure rather than the implementation details of the data. A stack or a queue is an example
of an ADT. Both stacks and queues can be implemented using an array or using a linked list.
Types of Data Structures

The different types of data structures include linear data structures, hash tables and non linear
data structures. The structure of a data file defines how records, or rows of data, are related to
fields, or columns of data.
Linear structures
A data structure is said to be linear if its elements form a sequence or a linear list.
Some of the linear structures are:

Array: Fixed-size
Linked-list: Variable-size
Stack: Add to top and remove from top
Queue: Add to back and remove from front
Priority queue: Add anywhere, remove the highest priority
Possible operations on these linear structures include:
Traversal: Travel through the data structure
Search: Traversal through the data structure for a given element
Insertion: Adding new elements to the data structure
Deletion: Removing an element from the data structure
Sorting: Arranging the elements in some type of order
Merging: Combining two similar data structures into one
Hash table
A hash table, or a hash map, is a data structure that associates keys with values. A function
termed as Hash function is applied on the key to find the address of the record.
Non linear structures

A data structure is said to be non linear if its elements are not in a sequence. The elements in the
data structure are not arranged in a linear manner; rather it has a branched structure.
Some of the non linear structures are:

Tree: Collection of nodes represented in hierarchical fashion
Graph: Collection of nodes connected together through edges
Page 6
C3: Protected
Selecting a Data Structure

Data structures that suit certain applications may not suit certain other applications. The choice of
the data structure often begins from the choice of an abstract data structure – an abstract storage
for data defined in terms of the set of operations to be performed on data and computational
complexity for performing these operations, regardless of the implementation in a concrete data
structure.
Selection of an abstract data structure is crucial in the design of efficient algorithms and in
estimating their computational complexity, while selection of concrete data structures is important
for efficient implementation of algorithms. The names of many abstract data structures and
abstract data types match the names of concrete data structures.
In the design of many types of programs, the choice of data structures is a primary design
consideration, as experience in building large systems has shown that the difficulty of
implementation and the quality and performance of the final result depends heavily on choosing
the best data structure.
Performance Analysis and Measurements

Performance analysis is often made in terms of best, worst and average cases of a given
algorithm. This expresses the resource usage as minimum, maximum, and average respectively.
The resource includes the running time, memory and any other resource. In real-time computing,
the worst case execution time is often of particular concern since it is important to know how much
time might be needed in the worst case to guarantee that the algorithm would always finish on
time.
Average performance and worst case performance are the most used in algorithm analysis. Less
widely found is best case performance. The best case performance is measured usually to
improve accuracy of an overall worst case analysis. Computer scientists use probabilistic analysis
techniques, especially expected value, to determine expected average running times.
Worst case performance analysis and average case performance analysis have similarities, but
usually require different tools and approaches in practice.
Determining what average input means is difficult. The complexity is analyzed based on the input
in general. Based on the nature of input, it is difficult to analyze equations in average case, and
hence it is difficult to characterize the complexity mathematically.
Worst case analysis has similar problems. Typically it is difficult to determine the exact worst case
scenario. Instead, a scenario is considered which is at least as bad as the worst case. For
example, when analyzing an algorithm, it may be possible to find the longest possible path through
the algorithm.
It is always important to find the efficiency of an algorithm with respect to the following:
CPU (time) usage
memory usage
disk usage
network usage
Page 7
C3: Protected
Measurement of complexity
Big O notation or Big Oh notation
Big O notation (Big Oh notation) expresses the amount of time required by the algorithm to
execute. It can be denoted using the symbol ‘O’. It is used in the analysis of the complexity of
algorithms and is used to characterize a function's behavior for the extreme inputs in a simple way.
The measurement of complexity for different scenarios is expressed as follows:

For a method which executes in constant time period, the complexity is given by O(1)
For a method which executes in linear time period, the complexity is given by O(N)
For a method which executes in quadratic time period, the complexity can be given by
O(N2)
Determination of complexities
Determining the complexity of an algorithm depends on the statements being used in the
algorithm. For different types of statements the complexity is given below
Sequence of statements
Statement 1;
Statement 2;
.
.
.
.
.
Statement n; // none of the statements are loops, all are independent statements
Time period can be given by

Total time = time (statement 1) + time (statement 2) + …. + time (statement 3)
If each statement is simple, then the time for each statement is constant, and hence the
total time is also constant. This makes the complexity as O(1).
Selection statement (if-then-else)

if (condition)
Sequence of statements 1;
else
Sequence of statements 2;
Page 8
C3: Protected
Here, either the sequence of statements 1 will be executed or sequence of statements 2

will be executed. So, the worst case complexity for the entire selection statement depends
on the complexity of sequence 1 and sequence 2. If sequence 1 has the complexity O(1)
and sequence 2 has the complexity O(N), the worst case complexity is taken as O(N).
Looping statement (for)

for (condition)
Sequence of simple statements;
Here, considering that the loop executes N times, the complexity can be given by N * O(1)
which is equivalent to O(N).
Nested loops
for (condition 1)
for (condition 2)
Sequence of simple statements;
Here, considering that the outer loop executes N times and the inner loop executes M
times, the complexity can be given by N * M * O(1). i.e., the complexity can be given as
O(N*M)
Summary
Study of data structure deals with the actual implementation of the data type and
offers a way of storing data in an efficient manner.
An Abstract Data Type (ADT) is a data type together with the operations, whose
properties are specified independently of any particular implementation
The different types of data structure available are:
o Linear
o Hash table
o Trees
o Graphs
A well-designed data structure allows a variety of critical operations to be performed,
using as few resources, both execution time and memory space, as possible.
Big O Notation can be made use of for the analysis of the complexity of algorithms.
Page 9
C3: Protected
Test your Understanding

1. The complexity of an algorithm which finds the sum of n numbers will be
a. O(n log n)
b. O(n2)
c. O(n)
d. O(2n)
2. Parent–Child relationship can be considered as a linear data structure

a. True
b. False
Answers
1. c
2. b
Page 10
C3: Protected
Session 2: Arrays
Learning Objectives
Define arrays
Use arrays as data structures
Overview
An array is a collection of individual values of the same data type stored in consequent memory
locations.
An array index (positioning in the array) usually starts from 0. We can even specify the value from
which the index should start depending on the language we use.
Here is an array of integers:
myArray
13 5 12 3 6 Array values
0 1 2 3 4 Array positions/Index
Declaring an array in C
int CArray[10];
Referring to elements of the array

The position of an element in an array is given by the index. The name of the array, followed by
the index, is used to refer to a particular element:
myArray[1] = 5;
The above statement assigns the value 5 to the element at the position 1(second element) of the
array, myArray.
Using elements of an array

Elements of the array can be used in the same way as variables of the same data type can be
used. i.e. an element of an array of integers can be used anywhere an integer variable can be
used.
printf ('The fifth element of the array is %d', myArray[4]);
Page 11
C3: Protected
The above statement prints the 5th element in myArray. i.e, it will print as follows:
The fifth element of the array is 6
Example: Assigning values to each element of the array
for ( count = 0 ; count < 5 ; count++)

{
evens[count] = 2 * count;
}
The above piece of code will construct an array evens as given below
0 2 4 6 8 Array values
0 1 2 3 4 Array index
Multi Dimensional Arrays

These are the arrays which has more than one dimension. For example, the following declaration
in C creates a two-dimensional array of two rows and two columns:
int myArray1[4,2]
The following declaration creates an array of three dimensions, 2, 2, and 3:

int myArray2[4,2,3];
Initialization
The following piece of code initializes the arrays myArray1 and myArray2
myArray1 = {(1, 2), (3, 4)}
myArray2 = {(1, 2), (3, 4), (5, 6)}
In a matrix form the above array can be represented as below
myArray1
1 2
3 4
myArray2
1 2
3 4
5 6
Page 12
C3: Protected
Memory Organization in an array

Array elements occupy contiguous locations in memory. The array elements are accessed using
their index. A function is needed to translate an array index to the address of the indexed element.
For a single dimensional array the address can be calculated as below:
Address = Base Address + (Index – Base Index) * Size
Where,
Base Index represents the value of the first index in the array
Size represents the size of a single element in bytes
Advantages and disadvantages of an array
Advantages
Array data structure is simple to use.
Elements in an array are stored in contiguous memory locations and hence each
element can be accessed directly using their index.
Allocation and de-allocation of memory is done automatically by the computer.
Disadvantages
Elements in an array are stored in contiguous memory locations and hence array can
not be stored if the available memory is non contiguous. i.e. if the size of the array is n
bytes, then there should be n contiguous bytes available in memory.
The array size is fixed and hence the size of the array can not be reduced or
increased at run time based on the requirement.
Stacks
A stack is a homogeneous collection of items of any one type, arranged linearly with access at one
end only, known as the top. This means that data can be added or removed from only the top.
Formally this type of stack is called a Last In First Out (LIFO) stack. Data is added to the stack
using the Push operation, and removed using the Pop operation.
In order to clarify the idea of a stack here is an example. Think of a number of plates kept in a
cafeteria. When the plates are being stacked, they are added one on top of each other. It doesn't
make much sense to put each plate on the bottom of the pile, as that would be far more work.
Similarly, when a plate is taken, it is usually taken from the top of the stack.
Stack consists of two parts:

Storage space within stack that contains the elements of a stack.
Top of stack that refers to the element pushed recently.
Page 13
C3: Protected
A stack can be implemented either using an array or a linked list.
Stack implementation using an array

Top is an integer value, which contains the array index for the top of the stack. Each time data is
pushed or popped, top is incremented or decremented accordingly, to keep track of the current top
of the stack. By convention, an empty stack is indicated by setting top to be equal to -1.
Stacks implemented as arrays are useful if a fixed amount of data is to be used. However, if the
amount of data is not a fixed size or the amount of the data fluctuates widely during the stack's life
time, then an array is a poor choice for implementing a stack.
Any recursive call is implemented with the help of a stack by the computer. The size of the stack
can not be predicted in recursion, and implementing the stack using array is a poor choice in this
Algorithm to implement the operations using array
Push:
if(top>=total_no_elements)
return(1); // Error code
else
{
printf("\n Enter the element \n");
scanf("%d",&stack[top]);
top++;
}
Page 14
C3: Protected
Pop:
if(top==0)
{
printf("\n STACK EMPTY \n");
}
else
{
top--;
printf("\n\nPopped element = %d\n",stack[top]);
}
Display:
if(top==0)
{
printf("\n STACK IS EMPTY \n");
}
else
{
printf("\n The elements inside the stack are :\n");
for(j=top-1;j>=0;j--)
{
printf("\n%d",stack[j]);
}
}
Stack operations:
Operation Description Return type Requirement
This operation adds or pushes The number of items on the

Push Data type
another item onto the stack. stack is less than n.
This operation removes an item from The number of items on the

Pop: Data type
the stack. stack must be greater than 0.
This operation returns the value of the Note: It does not remove that
Top: Data type
item at the top of the stack. item.
This operation returns true if the stack

Is Empty: Boolean
is empty and false if it is not.
This operation returns true if the stack

Is Full: Boolean
is full and false if it is not.
Page 15
C3: Protected
Queues
A queue is data structure in which elements are accessed from two different ends called Front and
Rear. The elements are inserted into a queue through the Rear end and are removed from the
Front end. The principle used in queue is "First In First Out" or FIFO.
There are two basic operations associated with a queue: enqueue and dequeue.
Enqueue means adding a new item to the rear end of the queue. The rear end always points to the
recently added element.
Dequeue refers to removing the item from front end of the queue. The front end always points to
the recently removed element.
Theoretically, a queue does not have a specific capacity. Regardless of how many elements are
already contained, a new element can always be added. It can also be empty, at which point
removing an element will be impossible until a new element has been added again.
A practical implementation of a queue using arrays does have some capacity limit. For a data
structure the executing computer will eventually run out of memory, thus limiting the queue size.
Queue overflow results from trying to add an element into a full queue and queue underflow
happens when trying to remove an element from an empty queue.
A queue consists of two major variables Front and Rear. Front refers to the first position of the
queue and Rear refers to the last position of the queue.
Types of queues
Circular queue
A circular queue is one in which the insertion of a new element is done at the very first location of
the queue if the last location of the queue is full. i.e. circular queue is one in which the first element
comes just after the last element.
A circular queue overcomes the problem of unutilized space in linear queues implemented as
arrays. A circular queue also have a Front and Rear to keep the track of elements to be deleted
and inserted and therefore to maintain the unique characteristic of the queue. The assumptions
made are:
1. Front will always be pointing to the first element
2. If Front=Rear, the queue is empty
3. Each time a new element is inserted into the queue the Rear is incremented by one.
4. Each time an element is deleted from the queue the value of Front is incremented by one
Page 16
C3: Protected
Example: Circular Queue
Q[0] Q[1]
Q[4] Q[2]
Q[3]
Inserting and deleting elements

Insertion and deletion of elements in a circular queue is the same as that in a linear queue except
that whenever an element is deleted from the front of the queue, the rear pointer can be made to
point to the vacant position and the element can be inserted there once the queue is full.
Front
5 10
Q[4] 20
Rear
Q[3]
Front
5 10
40 20
Rear
30
Before insertion
Page 17
C3: Protected
Front
Q[0] 10
40 20
Rear
30
After inserting two elements 30 and 40 – Queue full
Deletion in a circular queue

Now Q[0] will be available in the queue for another insertion.
Double Ended Queues

Double ended queue is a homogeneous list of elements in which insertion and deletion operations
are performed from both the ends. They are also called as deque.
There are two types of deques – Input-restricted deques and Output-restricted deques
The major operations involved are:

Insertion of an element at the Rear end of the queue.
Deletion of an element from the Front end of the queue
Insertion of an element at the Front end of the queue
Deletion of an element from the Rear end of the queue
For an input-restricted deque, all the four operations mentioned above are valid. For an output-
restricted deque, all the above points except the fourth are valid.
Priority Queue
In priority queues, the items added to the queue have a priority associated with them which
determines the order in which they exit the queue. Items with highest priority are removed first.
A priority queue is an abstract data type supporting the following three operations:
add an element to the queue with an associated priority
remove the element from the queue that has the highest priority, and return it
(optionally) peek at the element with highest priority without removing it
The simplest way to implement a priority queue data type is to keep an associative array mapping
each priority to a list of elements with that priority
Page 18
C3: Protected
Applications of queues
Round robin technique for processor scheduling uses the concept of queues
Railway ticket reservation center is designed using queues to store customer
information
Printer server routines are designed using queues
Scheduling and buffering queues

A queue is natural data structure for a system to serve the incoming requests. Most of the process
scheduling or disk scheduling algorithms in operating systems use queues. Computer hardware
like a processor or a network card also maintain buffers in the form of queues for incoming
resource requests. A stack like data structure causes starvation of the first requests, and is not
applicable in such cases. A mailbox or port to save messages to communicate between two users
or processes in a system is essentially a queue like structure.
Search space exploration

Like stacks, queues can be used to remember the search space that needs to be explored at one
point of time in traversing algorithms. Breadth first search of a graph uses a queue to remember
the nodes yet to be visited.
Implementation of queue using array

Inserting an element into a queue
if(rear==max_no_of_elements)
rear=0;
else
rear=rear+1;
if(rear==front)
{
printf("QUEUE OVERFLOW \n");
if(rear==0)
rear=max_no_of_elements-1;
else
rear=rear-1;
break;
}
else
{
printf("\n Enter the elements which you want to insert
:\n");
scanf("%d",&x);
queue[rear]=x;
}
Page 19
C3: Protected
Deletion of an element from a queue

if(front==rear)
printf(" QUEUE UNDERFLOW \n ");
else
{
if( front == (max_no_of_elements -1) )
front=0;
else
front=front+1;
x=queue[front];
}
In a stack, each new data item is stored at the top of the stack. ‘Top’ points to the top of the stack
in the figure. When a new data is added, the data is stored in the Top position and the Top pointer
is increased.
Summary
An array is a collection of individual values of the same data type stored in adjacent
memory locations
A stack is a homogeneous collection of items of any one type, arranged linearly with
access at one end only, known as the top. The two major operations available for a stack
include push(adding an element) and pop(deleting an element)
A collection of items in which only the earliest added item may be accessed. Basic
operations are add (to the tail) or enqueue and delete (from the head) or dequeue.
The major variations for queues are double ended queue, circular queue and priority queue

1. The elements inserted in order A, B, C, D are traversed in stack as
a. ABCD
b. DCBA
c. ADCB
d. None of the above
2. The size of an array can be ---

a. Extended
b. Reduced
c. Either a or b
d. Neither a nor b
Answers
1. b
2. d
Page 20
C3: Protected
Session 4: Linked Lists
Learning Objectives
Define linked list
Implement linked list operations in your program
Linked lists
A linked list can be viewed as a group of items, each of which points to the item in its
neighbourhood. An item in a linked list is known as a node. A node contains a data part and one or
two pointer part which contains the address of the neighbouring nodes in the list. Linked list is a
data structure that supports dynamic memory allocation and hence it solves the problems of using
an array.
Types of linked lists

The different types of linked lists include:
Singly linked lists
Circular linked lists
Doubly linked lists
Simple/Singly Linked Lists

In singly linked lists, each node contains a data part and an address part. The address part of the
node points to the next node in the list.
Node Structure of a linked list
Data part Link part
An example of a singly linked list can be pictured as shown below. Note that each node is pictured
as a box, while each pointer is drawn as an arrow. A NULL pointer is used to mark the end of the
list.
Page 21
C3: Protected
The head pointer points to the first node in a linked list

If head is NULL, the linked list is empty
A head pointer to a list
Possible Operations on a singly linked list

Insertion: Elements are added at any position in a linked list by linking nodes.
Deletion: Elements are deleted at any position in a linked list by altering the links of the
adjacent nodes.
Searching or Iterating through the list to display items.
To insert or delete items from any position of the list, we need to traverse the list starting from its
root till we get the item that we are looking for.
Implementation of a singly linked list
Creating a linked list

A node in a linked list is usually a structure in C and can be declared as
struct Node
{
int info;
Node *next;
}; //end struct
A node is dynamically allocated as follows:
Node *p;
p = new Node;
For creating the list, the following code can be used:

do
{
Current_node = malloc (sizeof (node) );
Current_node->info=input_value;
Current_node->next=NULL;
if(root_node==NULL) // the first node in the list
root_node=Current_node;
else
previous_node->next=Current_node;
previous_node=Current_node;
scanf("%d",&input_value);
} while(x!=-999);
Page 22
C3: Protected
The above given code will create the list by taking values until the user inputs -999.
Inserting an element
After getting the position and element which needs to be inserted, the following code can be used
to insert an element to the list
if(position==1||root_node==NULL)
{
Current_node->next=root_node;
Root_node=Current_node;
}
else
{
counter=2;
temp_node=root_node;
while((counter<position) &&(temp_node!=NULL))
{
counter++;
temp_node=temp_node->next;
}
Current_node->next=temp_node->next;
temp_node->next=Current_node;
}
The following figure illustrates how a node is inserted at an intermediate position in the list.
To insert a node between two nodes
The following figure illustrates how a node is inserted at the beginning of the list.
Page 23
C3: Protected
To insert a node at the beginning of a linked list
Deleting an element
After getting the element to be removed, the following code can be used to remove the particular
element.
temp_node=root_node;
if ( root_node != NULL )
if ( temp_node->info == input_element )
{
root_node=root_node->next;
return;
}
While ( temp_node != NULL && temp_node->next->info !=
input_element )
temp_node = temp_node->next;
if ( temp->next != NULL )
{
delete_node = temp_node->next;
temp_node->next=delete_node->next;
free ( delete_node ) ;
}
The following figures illustrate the deletion of an intermediate node and the deletion of the first
node from the list.
Page 24
C3: Protected
Deleting an intermediate node from a linked list
Deleting the first node
To display the elements of the list
temp_node = root_node;
while(temp_node != NULL)
{
printf("%d\t", temp_node->info);
temp_node = temp_node->next;
}
The following figure illustrates the above piece of code.
The effect of the assignment temp_node = temp_node->next
Page 25
C3: Protected
Efficiency and advantages of Linked Lists

Although arrays require same number of comparisons, the advantage lies in the fact
that no items need to be moved after insertion or deletion.
As opposed to fixed size of arrays, linked lists use exactly as much memory as is
needed.
Individual nodes need not be contiguous in memory.
Doubly Linked List

A more sophisticated kind of linked list is a doubly-linked list or a two-way linked list. In a doubly
linked list, each node has two links: one pointing to the previous node and one pointing to the next
node.
Node structure
Previous Link Data Next Link
An example of a doubly linked list
Implementation of a doubly linked list

Adding an element to the list
To add the first node
first_node->next = NULL;
first_node->data = input_element;
first_node->prev = NULL;
To add a node at the position specified
Temp_node = *first_node;
for ( counter = 0 ; counter<position-1 ; counter++ )
{
Temp_node = Temp_node->next;
}
new_node->next = temp_node->next;
temp_node->next->new_node;
new_node->prev = temp_node->next->prev;
temp_node->next->prev = new_node;
Page 26
C3: Protected
Deleting a particular element from the list

Temp_node = *first_node;
If ( temp_node->data = = input_element )
First_node = first_node->next;
else
{
while ( temp_node != NULL && temp_node->next->data !=
input_element)
temp_node = temp_node -> next;
delete_node=temp_node->next;
temp_node->next=delete_node->next;
delete_node->next->prev=temp_node;
free(delete_node);
}
Circular Linked Lists

In a circularly-linked list, the first and final nodes are linked together. In another words, circularly-
linked lists can be seen as having no beginning or end. To traverse a circular linked list, begin at
any node and follow the list in either direction until you return to the original node. This type of list
is most useful in cases where you have one object in a list and wish to see all other objects in the
list.
The pointer pointing to the whole list is usually called the end pointer.
Singly-circularly-linked list
In a singly-circularly-linked list, each node has one link, similar to an ordinary singly-linked list,
except that the link of the last node points back to the first node. As in a singly-linked list, new
nodes can only be efficiently inserted after a node we already have a reference to. For this reason,
it's usual to retain a reference to only the last element in a singly-circularly-linked list, as this allows
quick insertion at the beginning, and also allows access to the first node through the last node's
next pointer. The following figure shows a singly circularly linked list.
10 20 30 40
Doubly-circularly-linked list
In a doubly-circularly-linked list, each node has two links, similar to a doubly-linked list, except that
the previous link of the first node points to the last node and the next link of the last node points to
the first node. As in doubly-linked lists, insertions and removals can be done at any point with
access to any nearby node.
Page 27
C3: Protected
The following figure illustrates a doubly circularly linked list
10 20 30 40
Circularly-linked list vs. linearly-linked list

Circularly linked lists are useful to traverse an entire list starting at any point. In a linear linked list,
it is required to know the head pointer to traverse the entire list. The linear linked list cannot be
traversed completely with the help of an intermediate pointer.
Access to any element in a doubly circularly linked list is much easier than in a linearly linked list
since the particular element can be approached in two directions. For example to access an
element present in the fourth node of a circularly linked list having five elements, it is enough to
start from the last node and traverse the list in the reverse direction to get the value in the fourth
node.
Implementation of a circular linked list:

Creating the list
while (input_element != -999)
{
new_node=(struct node *) malloc (size);
new_node->info=input_element;
if ( root_node==NULL )
root_node=new_node;
else
( *last_node )->next=new_node;
(*last_node)=new_node;
scanf("%d",&input_element);
}
if(root!=NULL)
new->next=root;
return root;
Page 28
C3: Protected
Inserting elements into the list

After getting the position and value to be inserted, the following code can be followed:
new_node=(struct node *)malloc(sizeof(struct node));

new_node-> info=input_element;
if((position==1)||((*root_node)==NULL))
{
new_node->next =*root_node;
*root_node = new_node;
if((*last_node)!=NULL)
(*last_node)->next=*root_node;
else
*last_node=*start_node;
}
else
{
temp_node=*root_node;
counter=2;
while ( (counter<position) && (temp_node->next !=
(*root_node) ) )
{
temp_node=temp_node->next;
++counter;
}
if(temp_node->next==(*root_node))
*last_node=new_node;
new_node->next=temp_node->next;
temp_node->next=new_node;
}
Deleting an element from the list
After getting the element to be deleted, the following code can be used:
If(*front_node != NULL)
{
printf(“The item deleted is %d”,(*front_node->info));
If (*front_node == *rear_node)
{
*front_node = *rear_node = NULL;
}
else
{
*front_node = *front_node->next;
*rear_node->link = *front_node;
}
Page 29
C3: Protected
Stacks and queues using pointers
One disadvantage of using an array to implement a stack or queue is the wasted space---most of
the time most of the most of the space in the array is unused. A more elegant and economical
implementation of a stack or queue uses a linked list.
Here is a sketch of a linked-list-based stack that holds 1, then 5, and then 20 at the bottom:
1 5 20 NULL
Top
The list consists of three cells, each of which holds a data object and a link to another cell. A
variable, top, holds the address of the first cell in the list.
An empty stack looks like this:
Top NULL
Implementing stacks as linked lists provides a feasibility on the number of nodes by dynamically
growing stacks, as a linked list is a dynamic data structure. The stack can grow or shrink as the
program demands it to.
Algorithm to implement stack operations using pointers:
Push
node=(struct stack*)malloc(sizeof(struct stack));
printf("\n\n Enter the data ");
scanf("%d",&node->data);
node->link=top;
top=node;
Pop
if(top==NULL)
return(1); //Error code
else
{
printf("\n \n Item deleted is %d ",top->data);
top=top->link;
}
Page 30
C3: Protected
Display
i=top;
if(top==NULL)
return(1); //Error code
else
{
printf(" \n\n ELEMENTS ARE : \n");
while(i!=NULL)
{
printf("%d\n\n",i->data);
i=i->link;
}
}
Implementation of queues using lists is very similar to the implementation of stacks, except that in
this case items join the queue at the back and leave at the front. If the queue is represented by the
list [5, 2], adding a new item 3 will give the list [5, 2, 3]. In other words new items are added to the
end of the list. Removing an item from the queue will be done from the front.
A pictorial representation of a queue being implemented as a linked list is given below. The
variable rear points to the last item in the queue.
Front 5 2 3 NULL
Rear
Algorithm to represent queue operations using pointers
Inserting an element
new_element->link = NULL;
if (front==NULL)
front = new_element;
else
rear->link = new_element;
rear = new_element;
Deleting an element
temp = front;
front = front->link;
free (temp);
Page 31
C3: Protected
Summary
A linked list is a collection of elements called nodes, each of which contains a data
portion and a pointer to the node following that one in the linear ordering of the list.
A singly linked list is a dynamic data structure which can grow and shrink depending
upon the operations made. It has a single pointer which points to the successive node
in the list.
A doubly linked list is one in which all nodes are linked together by multiple number of
links which help in accessing both the successor node and the predecessor node from
a given node position. It provides bi-directional traversing.
A circular linked list is the one which has no end. i.e the link field of the last node does
not point to NULL, rather it points back to the beginning of the linked list.
Stacks and queues can be more efficiently implemented using pointers rather than by
using arrays.

1. The last node of a linear linked list ______.
a. Has the value null
b. Has a next reference whose value is null
c. Has a next reference which references the first node of the list
d. Cannot store any data
2. To delete a node N from a linear linked list, you will need to ______.
a. Set the link in the node that precedes N to link in the node that follows N
b. Set the link in the node that precedes N to link N
c. Set the link in the node that follows N to link in the node that precedes N
d. Set the link in N to link in the node that follows N
3. Write a function that removes all duplicate elements from a linear linked list.
4. Write a function to print the elements in reverse order of a singly linked list.
5. Write a function to find the largest element in a circular linked list.
Answers
1. b
2. b
Page 32
C3: Protected
Session 6: Sorting and Searching
Learning Objectives
Explain the concepts of sorting and searching
List the advantages of each technique
List the limitations of each technique
Sorting
Sorting refers to ordering data in an increasing or decreasing fashion according to some linear
relationship among the data items.
Sorting can be done on names, numbers and records. Sorting reduces the For example, it is
relatively easy to look up the phone number of a friend from a telephone dictionary because the
names in the phone book have been sorted into alphabetical order. This example clearly illustrates
one of the main reasons that sorting large quantities of information is desirable. That is, sorting
greatly improves the efficiency of searching. If we were to open a phone book, and find that the
names were not presented in any logical order, it would take an incredibly long time to look up
someone’s phone number.
Sorting can be performed using several methods, they are:
Selection Sort.
In this method, the successive elements are selected in order and are placed in their proper sorted
positions.
Insertion sort.
In this method, sorting is done by inserting elements into an existing sorted list. Initially, the sorted
list has only one element. Other elements are gradually added into the list in the proper position.
Bubble Sort.
In this method, the entire file will be passed through several times. Each pass will compare each
element with its successor and putting the element in the proper position.
Merge Sort.
In this method, the elements are divided into partitions until each partition has sorted elements.
Then, these partitions are merged and the elements are properly positioned to get a fully sorted
list.
Page 33
C3: Protected
Quick Sort.
In this method, an element called pivot is identified and that element is fixed in its place by moving
all the elements less than that to its left and all the elements greater than that to its right.
Radix Sort.
In this method, sorting is done based on the place values of the number. In this scheme, sorting is
done on the less-significant digits first. When all the numbers are sorted on a more significant digit,
numbers that have the same digit in that position but different digits in a less-significant position
are already sorted on the less-significant position.
Heap Sort
In this method, the file to be sorted is interpreted as a binary tree. Array, which is a sequential
representation of binary tree, is used to implement the heap sort.
In this chapter, focus is given to bubble sort, quick sort and heap sort.
The basic premise behind sorting an array is that its elements start out in some random order and
need to be arranged from lowest to highest.
It is easy to see that the list

1, 5, 6, 19, 23, 45, 67, 98, 124, 401
is sorted, whereas the list

4, 1, 90, 34, 100, 45, 23, 82, 11, 0, 600, 345
is not. The property that makes the second one "not sorted" is that there are adjacent elements
that are out of order. The first item is greater than the second instead of less, and likewise the third
is greater than the fourth and so on. Once this observation is made, it is not very hard to devise a
sort that proceeds by examining adjacent elements to see if they are in order, and swapping them
if they are not.
Bubble Sort
This sorting technique is named so because of the logic is similar to the bubble in water. When a
bubble is formed it is small at the bottom and when it moves up it becomes bigger and bigger i.e.
bubbles are in ascending order of their size from the bottom to the top. This sorting method
proceeds by scanning through the elements one pair at a time, and swapping any adjacent pairs it
finds to be out of order.
Page 34
C3: Protected
Example 6.1
Input sequence: 34 8 64 51 32 21
After iteration Altered sequence

# after an iteration # of swaps
------------------------------------------------------------------------
1 8 34 51 32 21 64 4
2 8 34 32 21 51 64 2
3 8 32 21 34 51 64 2
4 8 21 32 34 51 64 1
5 8 21 32 34 51 64 0
6 8 21 32 34 51 64 0
Each pass consists of comparing each element in the file with its successor (x[i] > x[i+1])
Swap the two elements if they are not in proper order. After each pass i, the largest element x[n-(i-
1)] is in its proper position within the sorted array.
Bubble Sort - Algorithm

bubble(int x[], int n)
{
int hold, j, pass;
int switched = TRUE;
for (pass = 0; pass < n - 1 && switched == TRUE; pass++)
{
switched = FALSE;
for (j = 0; j < n-pass-1; j++)
if (x[j] > x[j+1])
{
switched = TRUE; /* swap x[j], x[j+1] */
hold = x[j];
x[j] = x[j+1];
x[j+1] = hold;
}
} /* it stops if there is no swap in the pass */
}
In the first pass, n-1 items have to be scanned. On the second pass, the second largest item will
move to its correct position, and on the third pass (stopping at item n-3) the third largest will be in
place. It is this gradual filtration, or bubbling of the larger items to the top end that gives this sorting
technique its name.
Page 35
C3: Protected
There are two ways in which the sort can terminate with everything in the right order. It could
complete by reaching the n-1st pass and placing the second smallest item in its correct position.
Alternatively, it could find on some earlier pass that nothing needs to be swapped. That is, all
adjacent pairs are already in the correct order. In this case, there is no need to go on to
subsequent passes, for the sort is complete already. If the list started in sorted order, this would
happen on the very first pass. If it started in reverse order, it would not happen until the last one.
Quick Sort
In this sort an element called pivot is identified and that element is fixed in its place by moving all
the elements less than that to its left and all the elements greater than that to its right. Since it
partitions the element sequence into left, pivot and right it is referred as a sorting by partitioning.
Instead of moving a single element towards its place, a pair element is moved in a single swap.
This makes the sorting quick. After the partitioning, each of the sub-lists is sorted, which will cause
the entire array to be sorted.
quickSort(int first,int last)
{
if (first < last) /* if the part being sorted isn't empty */
{
mid = quickParition(first,last);
if (mid-1 > first)
quickSort(first,mid-1);
if (mid+1 < last)
quickSort(mid+1,last);
}
return;
}
The hardest part of quick sort is the partitioning of elements. The algorithm looks at the first
element of the array (called the "pivot"). It will put all of the elements which are less than the pivot
in the lower portion of the array and the elements higher than the pivot in the upper portion of the
array. When that is complete, it can put the pivot between those two sections and quick sort will be
able to sort the two sections separately.
The details of the partitioning algorithm depend on counters which are moving from the ends of the
array toward the center. Each will move until it finds a value which is in the wrong section of the
array (larger than the pivot and in the lower portion or less than the pivot and in the upper portion).
Those entries will be swapped to put them into their appropriate sections and the counters will
continue searching for out of place values. When the two counters cross, partitioning is complete
and the pivot can be swapped to its proper place between the two sections.
Page 36
C3: Protected
QuickParition(first, last)
{
mid_val = data[first]; /* This is the pivot value */
i = first+1;
j = last;
while (i<=j)
{
while ((i < last) && (data[i] <= mid_val))
i++;
while ((j >= first) && (data[j] > mid_val))
j--;
if (i < j)
swap(i,j);
else
i++;
}
if (j != first)
swap(j,first);
return j;
}
Example: 6.2
Input sequence: 34,8,64,51,32,21
Square brackets are used to demarcate sub files yet to be sorted.
R1 R2 R3 R4 R5 R6 m n
[34 8 64 51 32 21] 1 6
[32 8 21] 34 [51 64] 1 3
[21 8] 32 34 [51 64] 1 2
[8] 21 32 34 [51 64] 1 1
8 21 32 34 [51 64] 5 6
8 21 32 34 51 [64] 6 6
Heap Sort
In heap sort the file to be sorted is interpreted as a binary tree. The sorting technique is
implemented using array, which is a sequential representation of binary tree. The positioning of a
node is given as follows
For a node at position i the parent is at position i/2, the left child is at position 2i and right child is at
position 2i+1 ( 2i and 2i+1 <=n, otherwise children do not exist).
Heap sort is a two stage method. In the first stage the tree representing the input data is converted
into a heap. A heap can be defined as a complete binary tree with the property that the value of
each node is at least as large as the value of its children nodes. This, in turn, gives the root of the
tree as the largest key. In the second stage the output sequence is generated in decreasing order
by outputting the root and restructuring the remaining tree into a heap.
Page 37
C3: Protected
Example 6.3
The list of numbers 34, 8, 64, 51, 32, 21 is arranged in an array initially as in Input file of the
example given below. Here the value of n is 6, hence the least parent is 6/2 = 3. Left child of 64
(index 3) is compared with largest child, since 64 > 21 it is retained in its position. Parent 8 (index
2) is compared with its largest child 51 and are interchanged since 8 < 51. Now root 31(index 1) is
compared with its largest child 64 and are interchanged since 34 < 64 and is shown in initial heap.
34
64
8 64
51 34
51 32 21
8 32 21
Input File Initial Heap

In fig 6.3(a) given below, the first largest number 64 which was brought into root is interchanged
with the last element 21 (index 6) in the tree. For easy identification of arranged elements the edge
is removed from its parent. In fig 6.3(b) given below, the same procedure is followed to bring 51 to
root and is interchanged with the element in index 5. The same step is followed in fig 6.3(c) and fig
6.3(d) to get a sorted file as given in fig 6.3(e)
51 34
32 34 32 21
8 21 64 8 51 64
6.3 (a) 6.3 (b)
Page 38
C3: Protected
32 21
8 21 8 32
34 51 64 34 51 64
6.3 (c) 6.3 (d)
21 32
34 51 64
6.3 (e) Sorted File

Algorithm 6.3.1: Heap Sort implementation
Heap is an algorithm which sorts the given set of numbers using heap sort technique. Where ‘n’ is
the number of elements, ‘a’ is the array representation of elements in the input binary tree. The
heap algorithm 6.3.1 calls adjust algorithm 6.3.2 each time when heaping is needed.
heap(a,n)
{
Int i,t;
for(i=n/2;i>=1;i--)
{
adjust(a,i,n);
}
for(i=n;i>=2;i--)
{
t=a[i];
a[i]=a[1];
a[i]=t;
adjust(a,1,i-1);
}
}
Page 39
C3: Protected
Algorithm 6.3.2
adjust(int x[10],int i, int n)
{
int item, j;
j=2 * i;
item = x[i];
while (j <=n)
{
if((j<n)&&(x[j]<x[j+1]))
j=j+1;
if(item>=x[j])
break;
x[j/2]=x[j];
j=2 * j;
}
x[j/2]=item;
return 0;
}
Searching
Searching is a process of locating a particular element present in a given set of elements. The
element may be a record, a table, or a file.
A search algorithm is an algorithm that accepts an argument ‘a’ and tries to find an element whose
value is ‘a’. It is possible that the search for a particular element in a set is unsuccessful if that
element does not exist. There are number of techniques available for searching. Linear Search
and Binary Search techniques are discussed in this session.
Linear Search
In Linear Search the list is searched sequentially and the position is returned if the key element to
be searched is available in the list, otherwise -1 is returned.. The search in Linear Search starts at
the beginning of an array and move to the end, testing for a match at each item. All the elements
preceding the search element are traversed before the search element is traversed. i.e. if the
element to be searched is in position 10, all elements form 1-9 are checked before 10.
Page 40
C3: Protected
Algorithm : Linear search implementation
bool linear_search ( int *list, int size, int key, int* rec )
{
// Basic Linear search
bool found = false;
int i;
for ( i = 0; i < size; i++ )
{
if ( key == list[i] )
break;
}
if ( i < size )
{
found = true;
rec = &list[i];
}
return found;
}
The code searches for the element through a loop starting form 0 to n. The loop can terminate in
one of two ways. If the index variable i reach the end of the list, the loop condition fails. If the
current item in the list matches the key, the loop is terminated early with a break statement. Then
the algorithm tests the index variable to see if it is less than that size (thus the loop was terminated
early and the item was found), or not (and the item was not found).
Example 6.4
Assume the element 45 is searched from a sequence of sorted elements 12, 18, 25, 36, 45, 48,
50. The Linear search starts from the first element 12, since the value to be searched is not 12
(value 45), the next element 18 is compared and is also not 45, by this way all the elements before
45 are compared and when the index is 5, the element 45 is compared with the search value and
is equal, hence the element is found and the element position is 5.
List i Result of comparison

12 18 25 36 45 48 50 1 12 <> 45 : false
12 18 25 36 45 48 50 2 18 <> 45 : false
12 18 25 36 45 48 50 3 25 <> 45 : false
12 18 25 36 45 48 50 4 36 <> 45 : false
12 18 25 36 45 48 50 5 45 = 45 : true
Page 41
C3: Protected
Binary Search
In a linear search the search is done over the entire list even if the element to be searched is not
available. Some of our improvements work to minimize the cost of traversing the whole data set,
but those improvements only cover up what is really a problem with the algorithm. By thinking of
the data in a different way, we can make speed improvements that are much better than anything
linear search can guarantee. Consider a list in sorted order. It would work to search from the
beginning until an item is found or the end is reached, but it makes more sense to remove as much
of the working data set as possible so that the item is found more quickly. If we started at the
middle of the list we could determine which half the item is in (because the list is sorted). This
effectively divides the working range in half with a single test. This in turn reduces the time
complexity.
Algorithm:
bool Binary_Search ( int *list, int size, int key, int* rec )
{
bool found = false;
int low = 0, high = size - 1;
while ( high >= low )
{
int mid = ( low + high ) / 2;
if ( key < list[mid] )
high = mid - 1;
else
if ( key > list[mid] )
low = mid + 1;
else
{
found = true;
rec = &list[mid];
break;
}
}
return found;
}
Page 42
C3: Protected
Example 6.5
Binary search is applied for data in example 6.4
The active part of search is underlined
List i j mid Result of comparison

12 18 25 36 45 48 50 1 7 4 45 > 36 : Right part
12 18 25 36 45 48 50 5 7 6 45 < 48 : Left part
12 18 25 36 45 48 50 5 6 5 45 = 45 : Found
Method of search Advantages Disadvantages

Linear Simple Less efficient since time
Elements need not be in order Complexity is more compared
to Binary search -O(n)
Binary More efficient since the time Not simple as Linear search
complexity is less compared to Elements must be in order
Linear search – O(log n)
Summary
Sorting is process of arranging elements either in ascending or descending order. This
facilitates the searching faster.
Bubble sorting is a sorting in which each element is compared with its adjacent
elements and largest value is moved to last.
Quick sorting is a sorting by partitioning. Instead of a single element a pair of elements
are arrange in one swap.
Heap sorting is a sorting by heaping the elements in a tree. It works with the same
complexity in all its worst, best and average cases.
In Linear search all the elements preceding the search element must be searched.
In Binary search the middle element is compared and either the left are right part is
only checked instead of all.
Page 43
C3: Protected

1. Which of the following sort works with same complexity in all cases
a. Heap sort
b. Quick sort
c. Merge sort
d. Bubble sort
2. Quick sort works better if the input elements are of

a. Sorted order
b. Jumbled order
c. Reverse order
d. All the above
Answers
1. a
2. c
Page 44
C3: Protected
Session 8: Trees
Learning Objectives
After completing this chapter, you will be able to
Describe a tree
Explain how a tree can be represented internally
Describe how a tree can be traversed
Overview:
The data structures discussed in the previous sessions like Lists, stacks, and queues, are all linear
data structures. Tree is one of the several types of non-linear data structure.
Tree is a collection of nodes represented in a hierarchical fashion, with a specially designated

node called root. Except root all other nodes have parent in their higher hierarchy.
A parent node of a particular node is the one which is in the higher hierarchy for a A node can
have exactly one parent i.e. a node can be attached to exactly one node in its higher hierarchy.
Example 8.1
B C D
E F G H
Page 45
C3: Protected
The following table depicts some of the important terminologies related to a general tree structure.
Term Description Example

Node An item or single element represented in a tree A,B,C…….,H
Root Node that does not have any ancestors (parent A
or Grandparent
Sub tree Internal nodes in a tree which has both B,C,D
ancestor(parent) and descendant(child)
Leaf External nodes that does not have any E,F,G,H
descendant(child)
Edge The line depicts the connectivity between two (A-B),(A-C)…
nodes
Path Sequence of nodes connected A-B-E for E from root
Length Number of nodes involved in the path 2 for E from B
Height Length of the longest path from the root 3
Depth Length of the path to that node from the root 2 for D
Degree of a Number of children connected from that node 3 for A, 1 for B,D, 2 for C and
node 0 for leaves
Degree of a Degree of a node which has maximum degree 3 (since A has maximum
tree degree)
Some applications of trees are:

representing family genealogy
as the underlying structure in decision-making algorithms
to represent priority queues (a special kind of tree called a heap)
to provide fast access to information in a database (a special kind of tree called a b-
tree)
Binary Tree
Binary tree is a finite set of nodes which either empty, or consist of a root and two disjoint binary
trees, called the left and right sub-trees. In other words it can be defined as a tree in which all the
nodes can have 2 as a maximum degree i.e. a node can have maximum two children.
A binary tree differs from a general tree in the following aspects:

A tree must have at least one node but a binary tree may be empty.
A tree may have any number of sub-trees but a binary tree can have at most two.
Page 46
C3: Protected
Example 8.2
B C
D F G
Full Binary tree: A binary tree in which all its leaf nodes are in the same level is called a full binary
tree.
Example 8.3
B C
D E F G
Complete Binary tree

A binary tree in which the array representation is contiguous without any null pointers in between is
a complete binary tree.
Page 47
C3: Protected
Example 8.4
B C
D E
Array representation of the above tree is : 0 1 2 3 4

A B C D E
In a binary tree the maximum number of nodes at level i (level of the root node is 1) is equal to 2i-1
and the maximum number of nodes till level i is equal to 2i – 1
Example 8.5
In example 8.2
Number of nodes at level 2 is 22-1 = 2
Number of nodes at level 3 is 23-1 = 4
2
Maximum number of nodes till level 2 is 2 -1 = 3
Skewed binary tree

A binary tree is a skewed binary tree, if it has only left child (skewed left) or only right (skewed
right) child for all its internal nodes.
Page 48
C3: Protected
Example 8.6
A A
B B
D D
Skewed left Skewed right
Tree Representation
A binary tree can be represented in two ways and are

1. Array representation
2. Linked list representation
Array representation
The binary tree can be represented as we have discussed in the heap sort.
Since a binary-tree node never has more than two children, a node can be represented with 3
fields as one field for the data in the node in remaining two fields for two child pointers.
Left child Data Right Child
Programming representation of node is as follows.

Struct BinaryTreenode
{
Struct BinaryTreenode * leftChild;
Char data;
Struct BinaryTreenode * rightChild;
};
Many algorithms pertaining to tree structures usually involve a process in which each node of the
tree is “visited”, or processed, exactly once. Such a process is called a traversal.
Page 49
C3: Protected
Tree Traversals
A tree can be traversed in three different ways and are
Inorder traversal
Preorder traversal
Postorder traversal.
In all the traversal types the order of left and right sub tree are not changed i.e. always the left sub
tree is traversed before the right sub tree. The type of traversal is decided based on the position of
the data.
In preorder traversal the data is traversed before its sub trees are traversed.
In post order traversal the data is traversed after its sub trees are traversed.
In inorder traversal the data is traversed between its sub trees.
Simple steps in traversals

Preorder traversal
o Visit the root
o Traverse the left sub-tree in preorder
o Traverse the right sub-tree in preorder
Inorder traversal
o Traverse the left sub-tree in inorder
o Visit the root
o Traverse the right subtree in inorder
Postorder traversal
o Traverse the left subtree in postorder
o Traverse the right subtree in postorder
o Visit the root
Page 50
C3: Protected
Example 8.7
B C
D E F G
I J
Inorder traversal :DBEAIHJFCG

Preorder traversal : A B D E C F H I J G
Postorder traversal : D E B I J H F G C A
Algorithms for the tree traversals
Inorder traversal
void inorder(struct btreenode *sr)
{
if(sr!=NULL)
{
inorder (sr->left);
printf(“%d\n”, sr->data);
inorder (sr ->right);
}
}
Page 51
C3: Protected
Preorder traversal
void preorder(struct btreenode *sr)
{
if(sr!=NULL)
{
printf(“d\n“, sr->data);
preorder(sr -> left);
preorder (sr ->right);
}
}
Postorder traversal
void postorder(struct btreenode *sr)
{
if(sr!=NULL)
{
postorder(sr -> left);
postorder (sr ->right);
printf(“d\n“, sr->data);
}
}
Binary Search Tree (BST)

BST is a binary tree which has the following properties.
All elements stored in the left subtree of a node whose value is K have values less
than K. All elements stored in the right subtree of a node whose value is K have
values greater than or equal to K.
That is, a node’s left child must have a key less than its parent, and a node’s right
child must have a key greater or equal to its parent
The left and right sub trees of a node is also a binary search tree
Page 52
C3: Protected
Example 8.8
63
47 71
6 54 67 84
79 91
Operations that can be performed on a BST are:

Creation
Insertion
Deletion
Searching
Creation
The first element in the list is made as the root of the node. The elements following first are placed
in its left sub tree if they are less than the root and are placed in its right sub tree if they are greater
than the root. In other words we can state that creation is a combination of search and insertion
after the of root node.
Searching
The search is always carried from the root node, if the node to be searched is less than the root
value then the left sub tree is searched. If the search value is greater than the node value then the
right sub tree is searched. The search is continued till the search node is found or till the search is
ended without any branch to proceed.
Insertion
Steps involved in inserting a node are
Search for the node that has to be inserted (though it is not available) in the tree.
If the search ended at a node x insert the new node as its left child if the new node is
less than X, otherwise insert as its right child.
Page 53
C3: Protected
Example 8.9: Inserting 15 in BST

The dotted line represents the search and the dotted circle represents the newly added node.
63
47 71
6 54 67 84
15
79 91
15 is greater than 6 hence it is joined as its right child.
Deletion
The node which has to deleted is first searched from the root to find its position. The deletion
operation is easier if the node which has to deleted is a leaf node. The link from its parent is
disconnected in order to delete that node.
If the node is a non leaf node the deletion is carried as below.
If the non leaf node has a single sub tree then the child node is replaced in its place.
If the non leaf node has both left and right sub tree then either the in order successor or the
predecessor is replaced in its place.(i.e. the greatest left descendent or the smallest right
descendent)
Example 8.10 : Deleting 71 from example 8.9

The dotted line represents the search and the dotted circle represents the node to be deleted.
Page 54
C3: Protected
63
47 71
6 54 67 84
15
79 91
The node 71 is replaced either by its left or right descendent
63 63
47 67 47 79
6 54 84 6 54 67 84
15 15
79 91 91
Replaced by its left descendant Replaced by its left descendant
Advantage of a BST
Searching a node in a BST is faster, since either left or right sub tree is only searched from the
root till the node is found instead of comparing all the nodes preceding it.
Disadvantage of a BST
The tree may be a skewed binary tree if the elements are either in ascending(skewed left) or in
descending(skewed right) order, which lead to more levels.
Page 55
C3: Protected
Summary
Tree is collection of nodes arranged in hierarchical fashion
Binary tree is tree with 2 as its maximum degree
Tree can be represented either using an array or linked list
Tree can be traversed in 3 ways
Binary search tree is a binary tree in which a node can have all its left descendants as
less than that and right as greater than that.

1. A complete binary tree is a tree in which ----
a. All the leaf nodes are in the same level
b. All the parent nodes have exactly two children
c. The representation is contiguous without any null branch in between
d. None of the above
2. Binary search tree must be a ----

a. Complete binary tree
b. Full binary tree
c. Either ‘a’ or ‘b’
d. Need not be ‘a’ or ‘b’
Answers
1. c
2. d
Page 56
C3: Protected
Session 10: Balanced trees and hashing
Learning Objectives
After completing this chapter you will be able to
Define a balanced tree
Identify how a balanced tree can be constructed from a Binary tree
Define hashing
List the advantages and disadvantages of Hashing
Overview:
Balanced trees are classified into two categories
Height Balanced tree
Weight Balanced tree
AVL Tree
An AVL tree is a height balanced Binary Search Tree. The number of null branches is more in a
normal BST if the elements are almost in order, this leads to more levels and in turn need more
space. This problem is solved by balancing the height whenever a node is inserted into an AVL
tree. The re-balancing is recommended based on the balancing factor.
Balancing factor
Balancing factor of each node is calculated by finding the difference in levels between the left and
right sub tree.
Balancing factor of X = height of left sub tree of X - height of right sub tree of X
If the balancing factor of all the nodes in the tree is within the range of -1 and 1, then the tree is
already in balanced form, otherwise balancing is needed.
AVL Tree Rotations

As mentioned previously, an AVL Tree and the nodes it contains must meet strict balance
requirements to maintain its O(log n) search capabilities. These balance restrictions are
maintained using various rotation functions. Below is a diagrammatic overview of the four possible
rotations that can be performed on an unbalanced AVL Tree, illustrating the before and after states
of an AVL Tree requiring the rotation.
Page 57
C3: Protected
Example 10.1: LL Rotation
Example 10.2: RR Rotation
Page 58
C3: Protected
Example 10.3: LR Rotation
Example 10.4: RL Rotations
Page 59
C3: Protected
Inserting in an AVL Tree

Nodes are initially inserted into AVL Trees in the same manner as an ordinary binary search tree
(that is, they are always inserted as leaf nodes). After insertion, however, the insertion algorithm
for an AVL Tree travels back along the path it took to find the point of insertion, and checks the
balance at each node on the path. If a node is found that is unbalanced (that is, it has a balance
factor of either -2 or +2), then a rotation is performed based on the inserted nodes position relative
to the node being examined (the unbalanced node).
NB. There will ever be at most one rotation required after an insert operation.
Example: 10.5: Constructing an AVL tree for the list of elements 50, 45, 30, 55, 63, 53
The upper part of the node represents the balancing factor and the lower part represents data.
LL rotation
Insert 50, 45, 30 Insert 55 Insert 63
-2
45
0 -2
-1 30 50
2
45
50
0 -1 -1
1
30 50 55
45
0
0
0 63
55
30
Page 60
C3: Protected
RR Rotation Insert 53 RL Rotation
-2 0
45 50
0 1 1 0
-1 30 55 45 55
45
0
0 0 -1 0 0 0
30
30 55 50 63 53 63
0 0 0
50 63 53
Deletion in AVL tree

The deletion algorithm for AVL Trees is a little more complex, as there are several extra steps
involved in the deletion of a node. If the node is not a leaf node (that is, it has at least one child),
then the node must be swapped with either it's in-order successor or predecessor (based on
availability). Once the node has been swapped we can delete it (and have its parent pick up any
children it may have - bear in mind that it will only ever have at most one child). If a deletion node
was originally a leaf node, then it can simply be removed.
Now, as with the insertion algorithm, we traverse back up the path to the root node, checking the
balance of all nodes along the path. If we encounter an unbalanced node we perform an
appropriate rotation to balance the node.
NB. Unlike the insertion algorithm, more than one rotation may be required after a delete
operation, so in some cases we will have to continue back up the tree after a rotation.
Weight Balanced Trees

Tree structures support various basic dynamic set operations including Search, Predecessor,
Successor, Minimum, Maximum, Insert, and Delete in time proportional to the height of the tree.
Ideally, a tree will be balanced and the height will be log n where n is the number of nodes in the
tree. To ensure that the height of the tree is as small as possible and therefore provide the best
running time, a balanced tree structure like a red-black tree, AVL tree, or b-tree must be used.
When working with large sets of data, it is often not possible or desirable to maintain the entire
structure in primary storage (RAM). Instead, a relatively small portion of the data structure is
Page 61
C3: Protected
maintained in primary storage, and additional data is read from secondary storage as needed.
Unfortunately, a magnetic disk, the most common form of secondary storage, is significantly
slower than random access memory (RAM). In fact, the system often spends more time in
retrieving data than actually processing data.
B-trees are weight balanced trees that are optimized for situations when part or the entire tree
must be maintained in secondary storage such as a magnetic disk. Since disk accesses are
expensive (time consuming) operations, a b-tree tries to minimize the number of disk accesses.
For example, a b-tree with a height of 2 and a branching factor of 1001 can store over one billion
keys but requires at most two disk accesses to search for any node
B-Trees
The Structure of B-Trees
Unlike a binary-tree, each node of a b-tree may have a variable number of keys and children. The
keys are stored in non-decreasing order. Each key has an associated child that is the root of a
subtree containing all nodes with keys less than or equal to the key but greater than the preceding
key. A node also has an additional rightmost child that is the root for a subtree containing all keys
greater than any keys in the node.
A b-tree has a minimum number of allowable children for each node known as the minimization
factor. If t is this minimization factor, every node must have at least t - 1 keys. Under certain
circumstances, the root node is allowed to violate this property by having fewer than t - 1 keys.
Every node may have at most 2t - 1 keys or, equivalently, 2t children.
Since each node tends to have a large branching factor (a large number of children), it is typically
necessary to traverse relatively few nodes before locating the desired key. If access to each node
requires a disk access, then a b-tree will minimize the number of disk accesses required. The
minimization factor is usually chosen so that the total size of each node corresponds to a multiple
of the block size of the underlying storage device. This choice simplifies and optimizes disk
access. Consequently, a b-tree is an ideal data structure for situations where all data cannot reside
in primary storage and accesses to secondary storage are comparatively expensive (or time
consuming).
Height of B-Trees
For n greater than or equal to one, the height of an n-key b-tree T of height h with a minimum
degree t greater than or equal to 2,
The worst case height is O(log n). Since the "branchiness" of a b-tree can be large compared to
many other balanced tree structures, the base of the logarithm tends to be large; therefore, the
number of nodes visited during a search tends to be smaller than required by other tree structures.
Although this does not affect the asymptotic worst case height, b-trees tend to have smaller
heights than other trees with the same asymptotic height.
Operations on B-Trees
The algorithms for the search, create, and insert operations are shown below. Note that these
algorithms are single pass; in other words, they do not traverse back up the tree. Since b-trees
Page 62
C3: Protected
strive to minimize disk accesses and the nodes are usually stored on disk, this single-pass
approach will reduce the number of node visits and thus the number of disk accesses. Simpler
double-pass approaches that move back up the tree to fix violations are possible.
Since all nodes are assumed to be stored in secondary storage (disk) rather than primary storage
(memory), all references to a given node be preceded by a read operation denoted by Disk-Read.
Similarly, once a node is modified and it is no longer needed, it must be written out to secondary
storage with a write operation denoted by Disk-Write. The algorithms below assume that all nodes
referenced in parameters have already had a corresponding Disk-Read operation. New nodes are
created and assigned storage with the Allocate-Node call. The implementation details of the Disk-
Read, Disk-Write, and Allocate-Node functions are operating system and implementation
dependent.
B-Tree-Search(x, k)
i <- 1
while i <= n[x] and k > keyi[x]
do i <- i + 1
if i <= n[x] and k = keyi[x]
then return (x, i)
if leaf[x]
then return NIL
else Disk-Read(ci[x])
return B-Tree-Search(ci[x], k)
The search operation on a b-tree is analogous to a search on a binary tree. Instead of choosing
between a left and a right child as in a binary tree, a b-tree search must make an n-way choice.
The correct child is chosen by performing a linear search of the values in the node. After finding
the value greater than or equal to the desired value, the child pointer to the immediate left of that
value is followed. If all values are less than the desired value, the rightmost child pointer is
followed. Of course, the search can be terminated as soon as the desired node is found. Since the
running time of the search operation depends upon the height of the tree, B-Tree-Search is O(logt
n).
B-Tree-Create(T)
x <- Allocate-Node()
leaf[x] <- TRUE
n[x] <- 0
Disk-Write(x)
root[T] <- x
The B-Tree-Create operation creates an empty b-tree by allocating a new root node that has no
keys and is a leaf node. Only the root node is permitted to have these properties; all other nodes
must meet the criteria outlined previously. The B-Tree-Create operation runs in time O(1).
B-Tree-Split-Child(x, i, y)
z <- Allocate-Node()
leaf[z] <- leaf[y]
Page 63
C3: Protected
n[z] <- t - 1
for j <- 1 to t - 1
do keyj[z] <- keyj+t[y]
if not leaf[y]
then for j <- 1 to t
do cj[z] <- cj+t[y]
n[y] <- t - 1
for j <- n[x] + 1 downto i + 1
do cj+1[x] <- cj[x]
ci+1 <- z
for j <- n[x] downto i
do keyj+1[x] <- keyj[x]
keyi[x] <- keyt[y]
n[x] <- n[x] + 1
Disk-Write(y)
Disk-Write(z)
Disk-Write(x)
If is node becomes "too full," it is necessary to perform a split operation. The split operation moves
th
the median key of node x into its parent y where x is the i child of y. A new node, z, is allocated,
and all keys in x right of the median key are moved to z. The keys left of the median key remain in
the original node x. The new node, z, becomes the child immediately to the right of the median key
that was moved to the parent y, and the original node, x, becomes the child immediately to the left
of the median key that was moved into the parent y.
The split operation transforms a full node with 2t - 1 key into two nodes with t - 1 key each. Note
that one key is moved into the parent node. The B-Tree-Split-Child algorithm will run in time O(t)
where t is constant.
B-Tree-Insert(T, k)
r <- root[T]
if n[r] = 2t - 1
then s <- Allocate-Node()
root[T] <- s
leaf[s] <- FALSE
n[s] <- 0
c1 <- r
B-Tree-Split-Child(s, 1, r)
B-Tree-Insert-Nonfull(s, k)
else B-Tree-Insert-Nonfull(r, k)
B-Tree-Insert-Nonfull(x, k)
i <- n[x]
if leaf[x]
then while i >= 1 and k < keyi[x]
do keyi+1[x] <- keyi[x]
Page 64
C3: Protected
i <- i - 1
keyi+1[x] <- k
n[x] <- n[x] + 1
Disk-Write(x)
else while i >= and k < keyi[x]
do i <- i - 1
i <- i + 1
Disk-Read(ci[x])
if n[ci[x]] = 2t - 1
then B-Tree-Split-Child(x, i, ci[x])
if k > keyi[x]
then i <- i + 1
B-Tree-Insert-Nonfull(ci[x], k)
To perform an insertion on a b-tree, the appropriate node for the key must be located using an
algorithm similar to B-Tree-Search. Next, the key must be inserted into the node. If the node is not
full prior to the insertion, no special action is required; however, if the node is full, the node must
be split to make room for the new key. Since splitting the node results in moving one key to the
parent node, the parent node must not be full or another split operation is required. This process
may repeat all the way up to the root and may require splitting the root node. This approach
requires two passes. The first pass locates the node where the key should be inserted; the second
pass performs any required splits on the ancestor nodes.
Since each access to a node may correspond to a costly disk access, it is desirable to avoid the
second pass by ensuring that the parent node is never full. To accomplish this, the presented
algorithm splits any full nodes encountered while descending the tree. Although this approach may
result in unnecessary split operations, it guarantees that the parent never needs to be split and
eliminates the need for a second pass up the tree. Since a split runs in linear time, it has little
effect on the O(t logt n) running time of B-Tree-Insert.
Splitting the root node is handled as a special case since a new root must be created to contain
the median key of the old root. Observe that a b-tree will grow from the top.
B-Tree-Delete
Deletion of a key from a b-tree is possible; however, special care must be taken to ensure that the
properties of a b-tree are maintained. Several cases must be considered. If the deletion reduces
the number of keys in a node below the minimum degree of the tree, this violation must be
corrected by combining several nodes and possibly reducing the height of the tree. If the key has
children, the children must be rearranged.
Page 65
C3: Protected
B-Tree Insertion
10 17 25 9 13 16 8 5 15 22
Underlined elements are newly added
10 10 17 17
10 25
17
10 17 10 17
9 10 25
9 13 25 9 13 16 25
10
10 17
8 17
8 9 13 25
5 9 13 16 25
Page 66
C3: Protected
10
8 15 17
5 9 13 16 25
10
8 15 17
5 9 13 16 22 25
After deleting 16 from the above B-Tree
10
8 15 22
5 9 13 17 25
Page 67
C3: Protected
Hashing
Hashing is a technique which improvises the speed of search by calculating the address of the
search element directly using a mathematical formula instead of searching it.
Symbol Table
Symbol table is a dictionary of ADT used in a program. It is a set of names and attributes. The
characteristics of the name attributes vary depend upon its application.
Name : Identifier
Attribute : Initial value, list of lines using that id, etc
The possible operations in a symbol table are

Search if a particular name is in table
Retrieve attribute of that name
Modify the name and attributes
Insert a new name and attributes
Delete a name and attributes
Hashing techniques are used to search, insert, and delete the items (name & attributes). Unlike
identifier comparisons to perform a search, hashing technique uses a formula called hash function
h(x).
The hashing technique can be classified into two types

Static Hashing
Dynamic Hashing
Static Hashing:
In Static hashing the identifiers are stored in a fixed sized table called the hash table. The table
size cannot be altered in this hashing.
Dynamic Hashing:
In dynamic hashing the identifiers are stored in a dynamic sized table called the hash table. The
table size can be altered in this hashing. The arithmetic function h(x) gives the address of x in the
table. The address is named as hash address or home address.
Overflow:
A new key k1 is mapped or hashed into a full table. If the mapping results in a table which is
already full, then it cannot be inserted into that table, this type of situation is called overflow.
Hash Collision:
When two different keys are resulting in same address after a hash function is termed as collision.
Suppose that two keys k1 and k2 are such that h(k1) equals h(k2). Then when a record with key
k1 is entered into the table, it is inserted at position h(k1). But when k2 is hashed, because its
hash key is the same as that of k2, an attempt may be made to insert the record into the same
position where the record with key k1 is stored. Cleary, two records cannot occupy the same
position. Such a situation is called a hash collision or hash clash. Hash collision can be avoided
through rehashing and double hashing.
Page 68
C3: Protected
There are several kinds of hash functions, four of them are

Mid - Square Method.
Division
Folding
Digit analysis (Radix)
Mid- Square Method.
In this method the key value of the id x is squared and the bits form the middle part will be
considered for the address. Since the square depends on the entire digits of the key the address
will be usually unique even if some digits are same.
A = 67 A2 = 448910 = 106118
Mid A = 061 is the address.
Actually the binary bits will be calculated for address.
Division
The key value is divided by a hash and modulo is taken as id address
f D(X) = X mod M
The function returns the bucket address 0 through M-1 and so the hash table is at least of size
b = M. If M is powers of 2 then h D (x) depends only on least significant bits LSB (x), since
programmer tendency is keeping variable with same suffix, it results in many collisions. If M is
divisible by 2, then Odd keys will be mapped to odd buckets and even keys to even buckets. This
causes the hash table biased and increase in collision. These difficulties can be avoided by
making M as prime hash, and then only the factors of M will be M and 1.
Folding
The key x is divided in to several parts and are added together to get the final result of hashing.
Two types of folding methods are available and are:
Shift folding
Folding at the boundaries
In shift folding the parts are simply added together.
Example: 74568392
74 + 56 + 83 + 92
305
Folding at the boundaries (Reverse Folding)
Parts in even position are reversed and then the values are added together.
Example: 74 + 56 + 83 + 92
74 + 65 + 83 + 29
=> 242
Page 69
C3: Protected
Digit Analysis:
This type of hashing is useful in the case of a static file, i.e. all the identifiers in a table are known
in advance. Each id x is interpreted as a hash with the radix r. The same radix is used for all id’s in
the table. Using this radix, the digits of each id are examined. Digits having most skewed
distribution are deleted. Enough digits are deleted, so that the hash of remaining digits is small
enough to give an address in the range of the hash table.
To Manage Overflow
The size can be doubled, but this is wasteful
Adding new page to the end and dividing the id at one between the original and new
page. But this will complicate the family of hash function
The new id is joined as an overflow and the new page is created at the end and the first page ids
get rehash. But sometimes no id from first will go to new page, this results in un-uniform hash
function. The pages (from 1 according to hash of new page this is if n new pages added then n
pages from 1 will be rehashed) to be rehashed and the new pages are addressed using 3 bits.
The pages with overflow are addressed with r+1 bits and the pages without overflow are retained
with r bits itself.
Summary
Balanced Tree is a tree in which the number levels are minimized by balancing the
height or weight.
AVL tree is a height balanced tree, balancing is done through four possible rotations.
B-tree is a weight balanced tree, balancing is done to maintain number of elements
and sub trees in each node.
Hashing is the process of calculating the address of the item using a mathematical
formula instead of searching.

1) In an AVL tree, if the balance factors are -2 and 2, the tree has to be rotated using
a) Right Left
b) Right Right
c) Left Right
d) Left Left
2) Which of the following is not a hashing method
a) Mid-Square
b) Radix
c) Folding
d) None of the above
Answers
1) a
2) d
Page 70
C3: Protected
Session 11: Graphs
Learning Objectives
Represent the graph using array and Linked list
Traverse the graph
Calculate minimum cost spanning tree
Calculate the shortest route from source to all other nodes
Graphs
Introduction
Graph is a collection of nodes or vertices connected together through edges or arcs. Graphs are
used to model electrical circuits, chemical compounds, highway maps, and so on. They are also
used in the analysis of electrical circuits, finding the shortest route, project planning, linguistics,
genetics, social science, and so forth.
Graph Definitions and Notations

A graph G is a pair, G = (V, E), where V is a finite nonempty set, called the set of vertices of G. E
is called the set of edges. Let V(G) denote the set of vertices, and E(G) denote the set of edges of
a graph G. If the elements of E(G) are ordered pairs, G is called a directed graph or digraph;
otherwise, G is called an undirected graph. In an undirected graph, the pairs (u, v) and (v, u)
represent the same edge.
Let G be a graph. A graph H is called a sub-graph of G if V(H) ⊆ V(G) and E(H) ⊆ E

 (G); that is,
every vertex of H is a vertex of G, and every edge in H is an edge in G.
A graph can be shown pictorially. The vertices are drawn as circles, and a label inside the circle
represents the vertex. In an undirected graph, the edges are drawn using lines. In a directed
graph, the edges are drawn using arrows.
Page 71
C3: Protected
Example: 11.1
Undirected graph Directed graph
B B
A C E A C E
D D
(a) (b)
Let G be an undirected graph. Let u and v be two vertices in G. Then u and v are called adjacent if
there is an edge from one to the other; that is, (u, v) ⊆ E. Let e = (u, v) be an edge in G. We then
say that edge e is incident on the vertices u and v. An edge incident on a single vertex is called a
loop. If two edges, e1 and e2, are associated with the same pair of vertices, then e1 and e2 are
called parallel edges. A graph is called a simple graph if it has no loops and no parallel edges.
There is a path from u to v if there is a sequence of vertices u1, u2, ..., un such that u = u1, un = v,
and (ui, ui + 1) is an edge for all i = 1, 2, ..., n – 1.Vertices u and v are called connected if there is a
path from u to v.
A simple path is a path in which all the vertices, except possibly the first and last vertices, are
distinct. A cycle in G is a simple path in which the first and last vertices are the same. G is called
connected if there is a path from any vertex to any other vertex. A maximal subset of connected
vertices is called a component of G. Let G be a directed graph, and let u and v be two vertices in
G. If there is an edge from u to v, that is, (u, v) ⊆ E, then we say that u is adjacent to v and v is
adjacent from u. The definitions of the paths and cycles in G are similar to those for undirected
graphs. G is called strongly connected if any two vertices in G are connected.
Graph Representation
A graph can be represented in several ways. Two common ways: adjacency matrices and
adjacency lists.
Adjacency Matrix
Let G be a graph with n vertices, where n > 0. Let V(G) = {v1, v2, ..., vn}.The adjacency matrix AG
is a two dimensional matrix
n x n matrix such that the (i, j)th entry of AG is 1 if there is an edge from vi to vj; otherwise, the (i,
j)th entry is zero.
Example 11.2: Adjacency Matrix for graphs 11.1 (a) and (b)
A B C D E A B C D E
A 0 1 1 1 0 A 0 1 1 0 0
B 1 0 0 0 1 B 0 0 1 0 0
C 1 0 0 0 1 C 0 0 0 0 1
D 1 0 0 0 1 D 1 0 0 0 1
E 0 1 1 1 0 E 0 1 0 0 0
(a) (b)
Page 72
C3: Protected
Adjacency Lists
Let G be a graph with n vertices, where n > 0. Let V(G) = {v1, v2, ..., vn}. In the adjacency list
representation, corresponding to each vertex, v, there is a linked list such that each node of the
linked list contains the vertex u, such that (v, u) ⊆ E(G). Because there are n nodes, we use an
array, A, of size n, such that A[i] is a reference variable pointing to the first node of the linked list
containing the vertices to which vi is adjacent. Each node has two components, say vertex and
link. The component vertex contains the index of the vertex adjacent to vertex i.
Example 11.3: Adjacency list of graph in example 11.1
A B C D
B A E
C A E
D A E
E B C D
Operations on Graphs
The operations commonly performed on a graph are as follows:
Create the graph. That is, store the graph in computer memory using a particular
graph representation.
Clear the graph. This operation makes the graph empty.
Determine whether the graph is empty.
Traverse the graph.
Print the graph.
How a graph is represented in computer memory depends on the specific application. For
illustration purposes, we use the adjacency list (linked list) representation of graphs. Therefore, for
each vertex, v, the vertices adjacent to v (in a directed graph, also called the immediate
successors) is stored in the linked list associated with v.
Page 73
C3: Protected
Graph Traversals
Processing a graph requires the ability to traverse the graph. Traversing a graph is similar to
traversing a binary tree, except that traversing a graph is a bit more complicated. Recall that a
binary tree has no cycles. Also, starting at the root node, we can traverse the entire tree. On the
other hand, a graph might have cycles and we might not be able to traverse the entire graph from
a single vertex (for example, if the graph is not connected). Therefore, we must keep track of the
vertices that have been visited. We must also traverse the graph from each vertex (that has not
been visited) of the graph. This ensures that the entire graph is traversed. The two most common
graph traversal algorithms are the depth first traversal and breadth first traversal, which are
described next. For simplicity, we assume that when a vertex is visited, its index is output.
Moreover, each vertex is visited only once. We use the bool array visited to keep track of the
visited vertices.
Depth First Traversal

The depth first traversal is similar to the preorder traversal of a binary tree. An initial or source
vertex is identified to start traversing, then from that vertex any one vertex which is adjacent to the
current vertex is traversed i.e. only one adjacent vertex is traversed from the vertex which had
been traversed last.
The general algorithm is:

for each vertex v in the graph
if v is not visited
start the depth first traversal at v
The general algorithm to do a depth first traversal at a given node v is:

1. Mark node v as visited
2. Visit the node
3. For each vertex u adjacent to v
a. if u is not visited
b. start the depth first traversal at u
c. Clearly, this is a recursive algorithm.
Breadth First Traversal

The breadth first traversal of a graph is similar to traversing a binary tree level by level (the nodes
at each level are visited from left to right).All the nodes at any level, i, are visited before visiting the
nodes at level i + 1.
As in the case of the depth first traversal, because it might not be possible to traverse the entire
graph from a single vertex, the breadth first traversal also traverses the graph from each vertex
that is not visited. Starting at the first vertex, the graph is traversed as much as possible; we then
go to the next vertex that has not been visited. In other words it can be stated as all vertices that
are adjacent to the current vertex are traversed first. To implement the breadth first search
algorithm, we use a queue.
Page 74
C3: Protected

a.for each vertex v in the graph
if v is not visited
add v to the queue // start the breadth first search at v
b. Mark v as visited
c. while the queue is not empty
c.1. Remove vertex u from the queue
c.2. Retrieve the vertices adjacent to u
c.3. for each vertex w that is adjacent to u
if w is not visited
c.3.1. Add w to the queue
c.3.2. Mark w as visited
Example 11.4 The Depth first search for the above undirected graph in example 11.1
A, B, E, C, D
The Depth first search for the above undirected graph in example 11.1
A, B, C, D, E
Shortest Path Algorithm

Shortest path can be calculated only for the weighted graphs. The edges connecting two vertices
can be assigned a nonnegative real number, called the weight of the edge. A graph with such
weighted edges is called a weighted graph.
Let G be a weighted graph. Let u and v be two vertices in G, and let P be a path in G from u to v.
The weight of the path P is the sum of the weights of all the edges on the path P, which is also
called the weight of v from u via P.
Let G be a weighted graph representing a highway structure. Suppose that the weight of an edge
represents the travel time. For example, to plan monthly business trips, a salesperson wants to
find the shortest path (that is, the path with the smallest weight) from her or his city to every other
city in the graph. Many such problems exist in which we want to find the shortest path from a given
vertex, called the source, to every other vertex in the graph. This section describes the shortest
path algorithm, also called the greedy algorithm, developed by Dijkstra.
Shortest Path
Given a vertex, say vertex (that is, a source), this section describes the shortest path algorithm.
1. Initialize the array smallestWeight so that
smallestWeight[u] = weights[vertex, u].
2. Set smallestWeight[vertex] = 0.
3. Find the vertex, v, that is closest to vertex for which the shortest path has not been
determined.
4. Mark v as the (next) vertex for which the smallest weight is found.
Page 75
C3: Protected
5. For each vertex w in G, such that the shortest path from vertex to w has not been
determined and an edge (v, w) exists, if the weight of the path to w via v is smaller than its
current weight, update the weight of w to the weight of v + the weight of the edge (v, w).
Because there are n vertices, repeat Steps 3 through 5, n – 1 times.
Example 11.5: Shortest Path
B
1 2
5
A D
2
1
C
SOURCE : A
Edge Cost Path

B 1 A-B
C 2 A-C
D 5 A-D
Direct Cost
Select A-B
Edge Cost Path

B 1 A-B
C 2 A-C
D 3 A-B-D
Page 76
C3: Protected
Therefore A-B-D (3) < A-D (5)

Adjusted from B
Select A-C
Edge Cost Path

B 1 A-B
C 2 A-C
D 3 A-B-D
Therefore A-B-D (3) < A-D(5)
Minimum Spanning Trees

A spanning tree of a graph, G, is a set of |V|-1 edges that connect all vertices of the graph.
Suppose we have a group of islands that we wish to link with bridges so that it is possible to travel
from one island to any other in the group. Further suppose that (as usual) our government wishes
to spend the absolute minimum amount on this project (because other factors like the cost of
using, maintaining, etc, these bridges will probably be the responsibility of some future
government). The engineers are able to produce a cost for a bridge linking each possible pair of
islands. The set of bridges which will enable one to travel from any island to any other at minimum
capital cost to the government is the minimum spanning tree.
In general, it is possible to construct multiple spanning trees for a graph, G. If a cost, cij, is
associated with each edge, eij = (vi,vj), then the minimum spanning tree is the set of edges, Espan,
forming a spanning tree, such that:
C = sum( cij | all eij in Espan )

is a minimum.
Kruskal's Algorithm
This algorithm creates a forest of trees. Initially the forest consists of n single node trees (and no
edges). At each step, we add one (the cheapest one) edge so that it joins two trees together. If it
were to form a cycle, it would simply link two nodes that were already part of a single connected
tree, so that this edge would not be needed.
The basic algorithm looks like this:
The steps are:

1. Construct a forest - with each node in a separate tree.
2. Place the edges in a priority queue.
3. Until we've added n-1 edges,
i. Continue extracting the cheapest edge from the queue, until we find one that does
not form a cycle,
ii. Add it to the forest. Adding it to the forest will join two trees together.
Every step joins two trees in the forest together, so that, at the end, only one tree will remain in T.
Page 77
C3: Protected
The following sequence of diagrams illustrates Kruskal's algorithm in operation.
Example 11.6 Kruskal’s Algorithm
4 B
1
1 2
A C E
3
2
D
First edge A-C is selected
A C
Page 78
C3: Protected
Second edge B-E is selected
B
1
A C E
Third edge A-D is selected
B 1
A C E
2
D
Page 79
C3: Protected
Fourth edge C-D is selected
B 1
1 2
A C D
2
E
Summary
Graph is a collection of nodes Connected together using edges.
Graph can be traversed using DFS or BFS
Shortest path for a vertex with other vertices can be calculated using Dijkstra’s
algorithm.
Spanning tree is an acyclic graph.
Minimum cost spanning trees can be derived using Kruskal’s algorithm.

1. In a directed graph of n nodes, if the number of edges are ----- the graph is completed
graph
a. 4n*(n-1)
b. n*(n-1)/2
c. n
d. 2n
2. The drawback of using array representation is ----

a. Less memory utilized if number of edges are less
b. Can not find the in-degree and out-degree of a node
c. Both ‘a’ and ‘b’
d. Neither ‘a’ nor ‘b’
Answers
1. b
2. b
Page 80
C3: Protected
Glossary
Abstract data type A formal, language-independent description of data elements, the
relationships among then, and the operations that act upon them.
Acyclic graph A graph without any cycles.
Adjacency list A method using linked list to represent a the edges of graph or
network.
Adjacency matrix A method that uses a matrix to represent the edges of a graph or
network.
Adjacent Two nodes of a graph are adjacent if they are connected by an
edge.
Arc The edge of a graph that establishes a directional orientation
between its end point.
AVL Tree A tree in which, for each node, the difference between the height
of its left sub tree and the height of its right sub tree is at most
one.
Balanced Factor For a node in a binary tree, the difference between the height of
its left sub tree and the height of its right sub tree.
Big-O analysis A technique in which the time and the space requirements of an
algorithm are estimated in order of magnitude terms,
Binary search The process examining a middle value of a sorted array to see
which half contains the value in question and continuing to halve
until the value is located.
Binary search tree A Binary tree with the ordering property.
Binary tree A tree in which each node has at most two sub trees.
Breadth first search A visiting of all nodes in a graph, it proceeds from each node by
first visiting all nodes adjacent to that node.
B-tree An efficient, flexible index structure often used in DBMS on
random access files.
Bubble sort Rearranges elements of an array until they are in either ascending
or descending order.
Bucket In bucket hashing contiguous region of storage locations.
Children Nodes pointed to by an element in a tree.
Circular linked list A linked list in which the lost node of the list points to the first
node in the list.
Clustering Occurs when a collision resolution strategy causes keys that have
a collision at an initial hashing position to be relocated to the
same region within the storage space.
Collision Condition in which more than one key hashes to the same
position with a given hashing function.
Cycle A path of a graph which originates and terminates at the same
node.
Page 81
C3: Protected
Data Structure An abstraction of the elementary data types provided by a

language.
Degree The number of edges of a graph or tree for which the node is an
end point.
Depth first search A visiting of all nodes in a graph, it proceeds from each node by
first visiting one node adjacent to that node.
Digraph/Directed graph A Graph in which each edge establishes a directional orientation
between its end points.
Dijkstra’s algorithm An algorithm for finding the shortest path between two nodes in a
graph.
Directed path A sequence of directed edges from one node of a graph to
another. Each pair of successive edges in the path contains a end
point.
Double hashing A collision processing method in which a second hashing is used
to determine a sequence of storage locations to examine until an
available spot is found.
Double linked list A linked list in which each node has two pointers instead of one.
One pointer points to the node preceding that node and the points
to the node following that node in that list.
Edge Establishes a link between two nodes.
Folding A method of constructing a hashing function in cases where the
key is not an integer value. The non numeric characters are
removed and the remaining digits are combined to produce an
integer value.
Full binary tree A binary tree in which all the leaf nodes are in the same level.
Graph A structure composed of two sets of objects: a set of nodes and a
set of edges.
Hashing A density dependent search technique in the key for a given data
item is transformed using a hash function to produce the address.
Heap sort Sort in which the array is treated like the array implementation of
binary tree and the items are repeatedly manipulated to create a
heap from which the root is removed and added to the sorted
portion of the array.
Height balancing A technique for ensuring that an ordered binary tree remains as
full as possible in form.
In-degree of a node The number of directed edges that terminate at the node.
In-order traversal A binary tree traversal in which the order of traversal is left sub
tree, node and right sub tree.
Kruskal’s algorithm An algorithm for finding the minimum spanning tree of a graph.
Leaf In a tree a node that has no children.
Length Number of edges in a path.
Page 82
C3: Protected
Level All nodes in a tree whose path are the same length from the root
node.
Linear search The process of examining the first element in a list and
proceeding to examine the elements in the order until a match is
found.
Link A pointer from one node to another.
Linked List Collection of nodes connected through a pointer in a linear
fashion. Each node is divided into two parts as data and link.
Minimum spanning tree A collection of edges connecting all of the nodes of a graph
without any cycle.
Node A structure storing a data item in a linked list, tree, graph….
Out degree of node A number of directed edges that originate at the node.
Overflow In linked collision processing the area in which keys that cause
collisions are placed.
Parent In a tree the node that is pointing to its children.
Partition In quick sort the process of moving the pivot to the location where
it belongs in the sorted array and arranging the remaining data
items to the left of pivot if they are less than or equal to the pivot
and to the right if they are greater than or equal to the pivot.
Path A sequence of edges that connects two nodes in a graph.
Pivot Item used to direct the partitioning of quick sort.
Pointer A memory location containing the location of another data item.
Post order traversal A binary tree traversal in which the nodes are traversed in the
order of left sub tree, right sub tree and the node.
Preorder traversal A binary tree traversal in which the nodes are traversed in the
order of node, left sub tree and the right sub tree.
Priority queue A queue in which the deletion is done on priority.
Queue A data structure in which the elements are added at one end and
removed form the other end. Referred to as a FIFO.
Quick sort Relatively fast sorting technique that uses recursion and a
partition algorithms.
Rehashing Method of handling a collision in which a sequence of new
hashing function is applied to the key that caused the collision
until an available location for that the key is found.
Sibling Children of a same node.
Stack A data structure in which the elements are accessed from one
end. Referred to as a LIFO.
Tree A collection of nodes arranged in a hierarchical fashion.
Weight A numeric value associated with an edge in a graph.
Weight balancing Maintaining the number elements that can be handled in a single
node of trees.
Page 83
C3: Protected
References
Websites
http://www.macs.hw.ac.uk/~alison/ds98/ds98.html
http://www.cs.auckland.ac.nz/software/AlgAnim/lists.html
http://students.washington.edu/mukundn/courses/cse490b/
http://en.wikipedia.org/wiki/Data_structure
http://www.cs.indiana.edu/classes/a202-sbog/notes/BigOh.html
http://www.personal.psu.edu/faculty/j/h/jhm/f90/lectures/18.html
http://cslibrary.stanford.edu/103/
http://ocw.mit.edu/NR/rdonlyres/Civil-and-Environmental-Engineering/1-00Spring-
2005/9EBF826C-7CC3-40C8-8FA6-FF579460CC3E/0/sptutorial10.pdf
http://www.csc.liv.ac.uk/~frans/COMP101/AdditionalStuff/moreRecords.html
http://www.cse.cuhk.edu.hk/~csc2100a/lecture/sort1.pdf
http://www.cs.sunysb.edu/~skiena/214/lectures/lect16/lect16.html
http://www.iimb.ernet.in
www.ncsi.iisc.ernet.in
www.highered.mcgraw-hill.com
www.tech.purdue.edu
www.indianmba.com
www.iimb.ernet.in
http://wps.prenhall.com
Books
“Fundamentals of Data Structures”, Ellis Horowitz, Sartaj Sahni, Glagotia Book
Source, Computer Science Press Inc 1983
“An Introduction to Data Structures with applications” Jean-Paul Tremblay, Paul G
Sorenson, II Edition, Tata McGraw-Hill Edition
“Introduction to Data Structures”, Bhagat Singh, Thomas L Naps, Glagotia Book
Source
“Data Structures using C and C++”, Yedidyah Langson, Moshe J Augenstein, Aaron M
Tenenbaum, Pearson Education Asia
“Introduction to Data Structures and Algorithms Analysis”, Thomas L Naps, Second
Edition, WEST publishing company, US
Page 84
C3: Protected
STUDENT NOTES:
Page 85
C3: Protected

DSC Handout PDF

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

DSC Handout PDF

Загружено:

Авторское право:

Доступные форматы

Handout: Data Structures with C

Session 1: Introduction to Data Structure .................................................................................5

Session 2: Arrays ......................................................................................................................11

Session 4: Linked Lists .............................................................................................................21

Session 6: Sorting and Searching ............................................................................................33

Session 8: Trees ........................................................................................................................45

Session 10: Balanced trees and hashing.................................................................................57

Session 11: Graphs ...................................................................................................................71

STUDENT NOTES: .....................................................................................................................85

About this Document

Session 1: Introduction to Data Structure

Data type and data structure

Example data structures include:

Abstract Data Types (ADT)

Types of Data Structures

Some of the linear structures are:

Non linear structures

Some of the non linear structures are:

Selecting a Data Structure

Performance Analysis and Measurements

Big O notation or Big Oh notation

The measurement of complexity for different scenarios is expressed as follows:

Time period can be given by

Selection statement (if-then-else)

Here, either the sequence of statements 1 will be executed or sequence of statements 2

Looping statement (for)

Test your Understanding

2. Parent–Child relationship can be considered as a linear data structure

Referring to elements of the array

Using elements of an array

The fifth element of the array is 6

Example: Assigning values to each element of the array

for ( count = 0 ; count < 5 ; count++)

Multi Dimensional Arrays

The following declaration creates an array of three dimensions, 2, 2, and 3:

Memory Organization in an array

For a single dimensional array the address can be calculated as below:

Address = Base Address + (Index – Base Index) * Size

Advantages and disadvantages of an array

Stack consists of two parts:

A stack can be implemented either using an array or a linked list.

Stack implementation using an array

Algorithm to implement the operations using array

Operation Description Return type Requirement

This operation adds or pushes The number of items on the

This operation removes an item from The number of items on the

This operation returns true if the stack

This operation returns true if the stack

Example: Circular Queue

Inserting and deleting elements

After inserting two elements 30 and 40 – Queue full

Deletion in a circular queue

Double Ended Queues

The major operations involved are:

Scheduling and buffering queues

Search space exploration

Implementation of queue using array

Deletion of an element from a queue

Test your Understanding

2. The size of an array can be ---

Session 4: Linked Lists

Types of linked lists

Simple/Singly Linked Lists