Академический Документы
Профессиональный Документы
Культура Документы
Version: DSC/Handout/0307/2.1
Date: 05-03-07
Cognizant
500 Glen Pointe Center West
Teaneck, NJ 07666
Ph: 201-801-0233
www.cognizant.com
Data Structures with C
TABLE OF CONTENTS
Introduction ................................................................................................................................4
About this Document..................................................................................................................4
Target Audience.........................................................................................................................4
Objectives ..................................................................................................................................4
Pre-requisite ..............................................................................................................................4
Page 2
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Glossary .....................................................................................................................................81
References .................................................................................................................................84
Websites ..................................................................................................................................84
Books.......................................................................................................................................84
Page 3
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Introduction
Target Audience
In-Campus Trainees
Objectives
Acquire the basic knowledge on data structures
Select the appropriate data structures for the application
Analyze the complexity of the algorithm
Apply data structures using data structures
Pre-requisite
The participants must have basic knowledge in writing programs using C.
Page 4
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Learning Objectives
After completing this chapter, you will be able to:
Define a data structure
List the types of data structures
Identify how to analyze and select data structure for a particular application
Overview
Study of computer science involves study of organization, manipulation and utilization of data in a
computer in order to improve the efficiency of the processor and memory.
Data can be represented in the form of binary digits in memory. A binary digit can be stored using
the basic unit of data called bit. A bit can represent either a zero or a one.
Data type
A data type defines the specification of a set of data and the characteristics for that data. Data type
is derived from the basic nature of data that are stored for processing rather from their
implementation.
Data Structure
Data structure refers to the actual implementation of the data type and offers a way of storing data
in an efficient manner. Any data structure is designed to organize data to suit a specific purpose so
that it can be accessed and worked in appropriate ways both effectively and efficiently. In
computer programming, a data structure may be selected or designed to store data for the
purpose of working on it by various algorithms.
The choice of a data structure begins from the choice of an abstract data type. Data structures are
implemented using the data types, references and operations on them that are provided by a
programming language.
Page 5
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Linear structures
A data structure is said to be linear if its elements form a sequence or a linear list.
Hash table
A hash table, or a hash map, is a data structure that associates keys with values. A function
termed as Hash function is applied on the key to find the address of the record.
Page 6
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Selection of an abstract data structure is crucial in the design of efficient algorithms and in
estimating their computational complexity, while selection of concrete data structures is important
for efficient implementation of algorithms. The names of many abstract data structures and
abstract data types match the names of concrete data structures.
In the design of many types of programs, the choice of data structures is a primary design
consideration, as experience in building large systems has shown that the difficulty of
implementation and the quality and performance of the final result depends heavily on choosing
the best data structure.
Average performance and worst case performance are the most used in algorithm analysis. Less
widely found is best case performance. The best case performance is measured usually to
improve accuracy of an overall worst case analysis. Computer scientists use probabilistic analysis
techniques, especially expected value, to determine expected average running times.
Worst case performance analysis and average case performance analysis have similarities, but
usually require different tools and approaches in practice.
Determining what average input means is difficult. The complexity is analyzed based on the input
in general. Based on the nature of input, it is difficult to analyze equations in average case, and
hence it is difficult to characterize the complexity mathematically.
Worst case analysis has similar problems. Typically it is difficult to determine the exact worst case
scenario. Instead, a scenario is considered which is at least as bad as the worst case. For
example, when analyzing an algorithm, it may be possible to find the longest possible path through
the algorithm.
It is always important to find the efficiency of an algorithm with respect to the following:
CPU (time) usage
memory usage
disk usage
network usage
Page 7
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Measurement of complexity
Big O notation (Big Oh notation) expresses the amount of time required by the algorithm to
execute. It can be denoted using the symbol ‘O’. It is used in the analysis of the complexity of
algorithms and is used to characterize a function's behavior for the extreme inputs in a simple way.
Determination of complexities
Determining the complexity of an algorithm depends on the statements being used in the
algorithm. For different types of statements the complexity is given below
Sequence of statements
Statement 1;
Statement 2;
.
.
.
.
.
Statement n; // none of the statements are loops, all are independent statements
If each statement is simple, then the time for each statement is constant, and hence the
total time is also constant. This makes the complexity as O(1).
Page 8
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Here, considering that the loop executes N times, the complexity can be given by N * O(1)
which is equivalent to O(N).
Nested loops
for (condition 1)
for (condition 2)
Sequence of simple statements;
Here, considering that the outer loop executes N times and the inner loop executes M
times, the complexity can be given by N * M * O(1). i.e., the complexity can be given as
O(N*M)
Summary
Study of data structure deals with the actual implementation of the data type and
offers a way of storing data in an efficient manner.
An Abstract Data Type (ADT) is a data type together with the operations, whose
properties are specified independently of any particular implementation
The different types of data structure available are:
o Linear
o Hash table
o Trees
o Graphs
A well-designed data structure allows a variety of critical operations to be performed,
using as few resources, both execution time and memory space, as possible.
Big O Notation can be made use of for the analysis of the complexity of algorithms.
Page 9
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Answers
1. c
2. b
Page 10
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Session 2: Arrays
Learning Objectives
After completing this chapter, you will be able to:
Define arrays
Use arrays as data structures
Overview
An array is a collection of individual values of the same data type stored in consequent memory
locations.
An array index (positioning in the array) usually starts from 0. We can even specify the value from
which the index should start depending on the language we use.
Here is an array of integers:
myArray
13 5 12 3 6 Array values
0 1 2 3 4 Array positions/Index
Declaring an array in C
int CArray[10];
The above statement assigns the value 5 to the element at the position 1(second element) of the
array, myArray.
Page 11
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
The above statement prints the 5th element in myArray. i.e, it will print as follows:
The above piece of code will construct an array evens as given below
0 2 4 6 8 Array values
0 1 2 3 4 Array index
Initialization
The following piece of code initializes the arrays myArray1 and myArray2
myArray1 = {(1, 2), (3, 4)}
myArray2 = {(1, 2), (3, 4), (5, 6)}
In a matrix form the above array can be represented as below
myArray1
1 2
3 4
myArray2
1 2
3 4
5 6
Page 12
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Where,
Base Index represents the value of the first index in the array
Size represents the size of a single element in bytes
Advantages
Array data structure is simple to use.
Elements in an array are stored in contiguous memory locations and hence each
element can be accessed directly using their index.
Allocation and de-allocation of memory is done automatically by the computer.
Disadvantages
Elements in an array are stored in contiguous memory locations and hence array can
not be stored if the available memory is non contiguous. i.e. if the size of the array is n
bytes, then there should be n contiguous bytes available in memory.
The array size is fixed and hence the size of the array can not be reduced or
increased at run time based on the requirement.
Stacks
A stack is a homogeneous collection of items of any one type, arranged linearly with access at one
end only, known as the top. This means that data can be added or removed from only the top.
Formally this type of stack is called a Last In First Out (LIFO) stack. Data is added to the stack
using the Push operation, and removed using the Pop operation.
In order to clarify the idea of a stack here is an example. Think of a number of plates kept in a
cafeteria. When the plates are being stacked, they are added one on top of each other. It doesn't
make much sense to put each plate on the bottom of the pile, as that would be far more work.
Similarly, when a plate is taken, it is usually taken from the top of the stack.
Page 13
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Stacks implemented as arrays are useful if a fixed amount of data is to be used. However, if the
amount of data is not a fixed size or the amount of the data fluctuates widely during the stack's life
time, then an array is a poor choice for implementing a stack.
Any recursive call is implemented with the help of a stack by the computer. The size of the stack
can not be predicted in recursion, and implementing the stack using array is a poor choice in this
Push:
if(top>=total_no_elements)
return(1); // Error code
else
{
printf("\n Enter the element \n");
scanf("%d",&stack[top]);
top++;
}
Page 14
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Pop:
if(top==0)
{
printf("\n STACK EMPTY \n");
}
else
{
top--;
printf("\n\nPopped element = %d\n",stack[top]);
}
Display:
if(top==0)
{
printf("\n STACK IS EMPTY \n");
}
else
{
printf("\n The elements inside the stack are :\n");
for(j=top-1;j>=0;j--)
{
printf("\n%d",stack[j]);
}
}
Stack operations:
This operation returns the value of the Note: It does not remove that
Top: Data type
item at the top of the stack. item.
Page 15
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Queues
A queue is data structure in which elements are accessed from two different ends called Front and
Rear. The elements are inserted into a queue through the Rear end and are removed from the
Front end. The principle used in queue is "First In First Out" or FIFO.
There are two basic operations associated with a queue: enqueue and dequeue.
Enqueue means adding a new item to the rear end of the queue. The rear end always points to the
recently added element.
Dequeue refers to removing the item from front end of the queue. The front end always points to
the recently removed element.
Theoretically, a queue does not have a specific capacity. Regardless of how many elements are
already contained, a new element can always be added. It can also be empty, at which point
removing an element will be impossible until a new element has been added again.
A practical implementation of a queue using arrays does have some capacity limit. For a data
structure the executing computer will eventually run out of memory, thus limiting the queue size.
Queue overflow results from trying to add an element into a full queue and queue underflow
happens when trying to remove an element from an empty queue.
A queue consists of two major variables Front and Rear. Front refers to the first position of the
queue and Rear refers to the last position of the queue.
Types of queues
Circular queue
A circular queue is one in which the insertion of a new element is done at the very first location of
the queue if the last location of the queue is full. i.e. circular queue is one in which the first element
comes just after the last element.
A circular queue overcomes the problem of unutilized space in linear queues implemented as
arrays. A circular queue also have a Front and Rear to keep the track of elements to be deleted
and inserted and therefore to maintain the unique characteristic of the queue. The assumptions
made are:
1. Front will always be pointing to the first element
2. If Front=Rear, the queue is empty
3. Each time a new element is inserted into the queue the Rear is incremented by one.
4. Each time an element is deleted from the queue the value of Front is incremented by one
Page 16
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Q[0] Q[1]
Q[4] Q[2]
Q[3]
Front
5 10
Q[4] 20
Rear
Q[3]
Front
5 10
40 20
Rear
30
Before insertion
Page 17
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Front
Q[0] 10
40 20
Rear
30
There are two types of deques – Input-restricted deques and Output-restricted deques
For an input-restricted deque, all the four operations mentioned above are valid. For an output-
restricted deque, all the above points except the fourth are valid.
Priority Queue
In priority queues, the items added to the queue have a priority associated with them which
determines the order in which they exit the queue. Items with highest priority are removed first.
A priority queue is an abstract data type supporting the following three operations:
add an element to the queue with an associated priority
remove the element from the queue that has the highest priority, and return it
(optionally) peek at the element with highest priority without removing it
The simplest way to implement a priority queue data type is to keep an associative array mapping
each priority to a list of elements with that priority
Page 18
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Applications of queues
Round robin technique for processor scheduling uses the concept of queues
Railway ticket reservation center is designed using queues to store customer
information
Printer server routines are designed using queues
Page 19
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
In a stack, each new data item is stored at the top of the stack. ‘Top’ points to the top of the stack
in the figure. When a new data is added, the data is stored in the Top position and the Top pointer
is increased.
Summary
An array is a collection of individual values of the same data type stored in adjacent
memory locations
A stack is a homogeneous collection of items of any one type, arranged linearly with
access at one end only, known as the top. The two major operations available for a stack
include push(adding an element) and pop(deleting an element)
A collection of items in which only the earliest added item may be accessed. Basic
operations are add (to the tail) or enqueue and delete (from the head) or dequeue.
The major variations for queues are double ended queue, circular queue and priority queue
Answers
1. b
2. d
Page 20
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Learning Objectives
After completing this chapter, you will be able to:
Define linked list
Implement linked list operations in your program
Linked lists
A linked list can be viewed as a group of items, each of which points to the item in its
neighbourhood. An item in a linked list is known as a node. A node contains a data part and one or
two pointer part which contains the address of the neighbouring nodes in the list. Linked list is a
data structure that supports dynamic memory allocation and hence it solves the problems of using
an array.
An example of a singly linked list can be pictured as shown below. Note that each node is pictured
as a box, while each pointer is drawn as an arrow. A NULL pointer is used to mark the end of the
list.
Page 21
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Page 22
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
The above given code will create the list by taking values until the user inputs -999.
Inserting an element
After getting the position and element which needs to be inserted, the following code can be used
to insert an element to the list
if(position==1||root_node==NULL)
{
Current_node->next=root_node;
Root_node=Current_node;
}
else
{
counter=2;
temp_node=root_node;
while((counter<position) &&(temp_node!=NULL))
{
counter++;
temp_node=temp_node->next;
}
Current_node->next=temp_node->next;
temp_node->next=Current_node;
}
The following figure illustrates how a node is inserted at an intermediate position in the list.
The following figure illustrates how a node is inserted at the beginning of the list.
Page 23
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Deleting an element
After getting the element to be removed, the following code can be used to remove the particular
element.
temp_node=root_node;
if ( root_node != NULL )
if ( temp_node->info == input_element )
{
root_node=root_node->next;
return;
}
While ( temp_node != NULL && temp_node->next->info !=
input_element )
temp_node = temp_node->next;
if ( temp->next != NULL )
{
delete_node = temp_node->next;
temp_node->next=delete_node->next;
free ( delete_node ) ;
}
The following figures illustrate the deletion of an intermediate node and the deletion of the first
node from the list.
Page 24
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
temp_node = root_node;
while(temp_node != NULL)
{
printf("%d\t", temp_node->info);
temp_node = temp_node->next;
}
Page 25
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Node structure
Page 26
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
The pointer pointing to the whole list is usually called the end pointer.
Singly-circularly-linked list
In a singly-circularly-linked list, each node has one link, similar to an ordinary singly-linked list,
except that the link of the last node points back to the first node. As in a singly-linked list, new
nodes can only be efficiently inserted after a node we already have a reference to. For this reason,
it's usual to retain a reference to only the last element in a singly-circularly-linked list, as this allows
quick insertion at the beginning, and also allows access to the first node through the last node's
next pointer. The following figure shows a singly circularly linked list.
10 20 30 40
Doubly-circularly-linked list
In a doubly-circularly-linked list, each node has two links, similar to a doubly-linked list, except that
the previous link of the first node points to the last node and the next link of the last node points to
the first node. As in doubly-linked lists, insertions and removals can be done at any point with
access to any nearby node.
Page 27
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
10 20 30 40
Access to any element in a doubly circularly linked list is much easier than in a linearly linked list
since the particular element can be approached in two directions. For example to access an
element present in the fourth node of a circularly linked list having five elements, it is enough to
start from the last node and traverse the list in the reverse direction to get the value in the fourth
node.
Page 28
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
After getting the element to be deleted, the following code can be used:
If(*front_node != NULL)
{
printf(“The item deleted is %d”,(*front_node->info));
If (*front_node == *rear_node)
{
*front_node = *rear_node = NULL;
}
else
{
*front_node = *front_node->next;
*rear_node->link = *front_node;
}
Page 29
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
One disadvantage of using an array to implement a stack or queue is the wasted space---most of
the time most of the most of the space in the array is unused. A more elegant and economical
implementation of a stack or queue uses a linked list.
Here is a sketch of a linked-list-based stack that holds 1, then 5, and then 20 at the bottom:
1 5 20 NULL
Top
The list consists of three cells, each of which holds a data object and a link to another cell. A
variable, top, holds the address of the first cell in the list.
Top NULL
Implementing stacks as linked lists provides a feasibility on the number of nodes by dynamically
growing stacks, as a linked list is a dynamic data structure. The stack can grow or shrink as the
program demands it to.
Push
node=(struct stack*)malloc(sizeof(struct stack));
printf("\n\n Enter the data ");
scanf("%d",&node->data);
node->link=top;
top=node;
Pop
if(top==NULL)
return(1); //Error code
else
{
printf("\n \n Item deleted is %d ",top->data);
top=top->link;
}
Page 30
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Display
i=top;
if(top==NULL)
return(1); //Error code
else
{
printf(" \n\n ELEMENTS ARE : \n");
while(i!=NULL)
{
printf("%d\n\n",i->data);
i=i->link;
}
}
Implementation of queues using lists is very similar to the implementation of stacks, except that in
this case items join the queue at the back and leave at the front. If the queue is represented by the
list [5, 2], adding a new item 3 will give the list [5, 2, 3]. In other words new items are added to the
end of the list. Removing an item from the queue will be done from the front.
A pictorial representation of a queue being implemented as a linked list is given below. The
variable rear points to the last item in the queue.
Front 5 2 3 NULL
Rear
Inserting an element
new_element->link = NULL;
if (front==NULL)
front = new_element;
else
rear->link = new_element;
rear = new_element;
Deleting an element
temp = front;
front = front->link;
free (temp);
Page 31
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Summary
A linked list is a collection of elements called nodes, each of which contains a data
portion and a pointer to the node following that one in the linear ordering of the list.
A singly linked list is a dynamic data structure which can grow and shrink depending
upon the operations made. It has a single pointer which points to the successive node
in the list.
A doubly linked list is one in which all nodes are linked together by multiple number of
links which help in accessing both the successor node and the predecessor node from
a given node position. It provides bi-directional traversing.
A circular linked list is the one which has no end. i.e the link field of the last node does
not point to NULL, rather it points back to the beginning of the linked list.
Stacks and queues can be more efficiently implemented using pointers rather than by
using arrays.
2. To delete a node N from a linear linked list, you will need to ______.
a. Set the link in the node that precedes N to link in the node that follows N
b. Set the link in the node that precedes N to link N
c. Set the link in the node that follows N to link in the node that precedes N
d. Set the link in N to link in the node that follows N
3. Write a function that removes all duplicate elements from a linear linked list.
4. Write a function to print the elements in reverse order of a singly linked list.
Answers
1. b
2. b
Page 32
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Learning Objectives
After completing this chapter, you will be able to:
Explain the concepts of sorting and searching
List the advantages of each technique
List the limitations of each technique
Sorting
Sorting refers to ordering data in an increasing or decreasing fashion according to some linear
relationship among the data items.
Sorting can be done on names, numbers and records. Sorting reduces the For example, it is
relatively easy to look up the phone number of a friend from a telephone dictionary because the
names in the phone book have been sorted into alphabetical order. This example clearly illustrates
one of the main reasons that sorting large quantities of information is desirable. That is, sorting
greatly improves the efficiency of searching. If we were to open a phone book, and find that the
names were not presented in any logical order, it would take an incredibly long time to look up
someone’s phone number.
Selection Sort.
In this method, the successive elements are selected in order and are placed in their proper sorted
positions.
Insertion sort.
In this method, sorting is done by inserting elements into an existing sorted list. Initially, the sorted
list has only one element. Other elements are gradually added into the list in the proper position.
Bubble Sort.
In this method, the entire file will be passed through several times. Each pass will compare each
element with its successor and putting the element in the proper position.
Merge Sort.
In this method, the elements are divided into partitions until each partition has sorted elements.
Then, these partitions are merged and the elements are properly positioned to get a fully sorted
list.
Page 33
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Quick Sort.
In this method, an element called pivot is identified and that element is fixed in its place by moving
all the elements less than that to its left and all the elements greater than that to its right.
Radix Sort.
In this method, sorting is done based on the place values of the number. In this scheme, sorting is
done on the less-significant digits first. When all the numbers are sorted on a more significant digit,
numbers that have the same digit in that position but different digits in a less-significant position
are already sorted on the less-significant position.
Heap Sort
In this method, the file to be sorted is interpreted as a binary tree. Array, which is a sequential
representation of binary tree, is used to implement the heap sort.
In this chapter, focus is given to bubble sort, quick sort and heap sort.
The basic premise behind sorting an array is that its elements start out in some random order and
need to be arranged from lowest to highest.
is not. The property that makes the second one "not sorted" is that there are adjacent elements
that are out of order. The first item is greater than the second instead of less, and likewise the third
is greater than the fourth and so on. Once this observation is made, it is not very hard to devise a
sort that proceeds by examining adjacent elements to see if they are in order, and swapping them
if they are not.
Bubble Sort
This sorting technique is named so because of the logic is similar to the bubble in water. When a
bubble is formed it is small at the bottom and when it moves up it becomes bigger and bigger i.e.
bubbles are in ascending order of their size from the bottom to the top. This sorting method
proceeds by scanning through the elements one pair at a time, and swapping any adjacent pairs it
finds to be out of order.
Page 34
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Example 6.1
Input sequence: 34 8 64 51 32 21
Each pass consists of comparing each element in the file with its successor (x[i] > x[i+1])
Swap the two elements if they are not in proper order. After each pass i, the largest element x[n-(i-
1)] is in its proper position within the sorted array.
In the first pass, n-1 items have to be scanned. On the second pass, the second largest item will
move to its correct position, and on the third pass (stopping at item n-3) the third largest will be in
place. It is this gradual filtration, or bubbling of the larger items to the top end that gives this sorting
technique its name.
Page 35
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
There are two ways in which the sort can terminate with everything in the right order. It could
complete by reaching the n-1st pass and placing the second smallest item in its correct position.
Alternatively, it could find on some earlier pass that nothing needs to be swapped. That is, all
adjacent pairs are already in the correct order. In this case, there is no need to go on to
subsequent passes, for the sort is complete already. If the list started in sorted order, this would
happen on the very first pass. If it started in reverse order, it would not happen until the last one.
Quick Sort
In this sort an element called pivot is identified and that element is fixed in its place by moving all
the elements less than that to its left and all the elements greater than that to its right. Since it
partitions the element sequence into left, pivot and right it is referred as a sorting by partitioning.
Instead of moving a single element towards its place, a pair element is moved in a single swap.
This makes the sorting quick. After the partitioning, each of the sub-lists is sorted, which will cause
the entire array to be sorted.
quickSort(int first,int last)
{
if (first < last) /* if the part being sorted isn't empty */
{
mid = quickParition(first,last);
if (mid-1 > first)
quickSort(first,mid-1);
if (mid+1 < last)
quickSort(mid+1,last);
}
return;
}
The hardest part of quick sort is the partitioning of elements. The algorithm looks at the first
element of the array (called the "pivot"). It will put all of the elements which are less than the pivot
in the lower portion of the array and the elements higher than the pivot in the upper portion of the
array. When that is complete, it can put the pivot between those two sections and quick sort will be
able to sort the two sections separately.
The details of the partitioning algorithm depend on counters which are moving from the ends of the
array toward the center. Each will move until it finds a value which is in the wrong section of the
array (larger than the pivot and in the lower portion or less than the pivot and in the upper portion).
Those entries will be swapped to put them into their appropriate sections and the counters will
continue searching for out of place values. When the two counters cross, partitioning is complete
and the pivot can be swapped to its proper place between the two sections.
Page 36
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
QuickParition(first, last)
{
mid_val = data[first]; /* This is the pivot value */
i = first+1;
j = last;
while (i<=j)
{
while ((i < last) && (data[i] <= mid_val))
i++;
while ((j >= first) && (data[j] > mid_val))
j--;
if (i < j)
swap(i,j);
else
i++;
}
if (j != first)
swap(j,first);
return j;
}
Example: 6.2
Input sequence: 34,8,64,51,32,21
Square brackets are used to demarcate sub files yet to be sorted.
R1 R2 R3 R4 R5 R6 m n
[34 8 64 51 32 21] 1 6
[32 8 21] 34 [51 64] 1 3
[21 8] 32 34 [51 64] 1 2
[8] 21 32 34 [51 64] 1 1
8 21 32 34 [51 64] 5 6
8 21 32 34 51 [64] 6 6
Heap Sort
In heap sort the file to be sorted is interpreted as a binary tree. The sorting technique is
implemented using array, which is a sequential representation of binary tree. The positioning of a
node is given as follows
For a node at position i the parent is at position i/2, the left child is at position 2i and right child is at
position 2i+1 ( 2i and 2i+1 <=n, otherwise children do not exist).
Heap sort is a two stage method. In the first stage the tree representing the input data is converted
into a heap. A heap can be defined as a complete binary tree with the property that the value of
each node is at least as large as the value of its children nodes. This, in turn, gives the root of the
tree as the largest key. In the second stage the output sequence is generated in decreasing order
by outputting the root and restructuring the remaining tree into a heap.
Page 37
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Example 6.3
The list of numbers 34, 8, 64, 51, 32, 21 is arranged in an array initially as in Input file of the
example given below. Here the value of n is 6, hence the least parent is 6/2 = 3. Left child of 64
(index 3) is compared with largest child, since 64 > 21 it is retained in its position. Parent 8 (index
2) is compared with its largest child 51 and are interchanged since 8 < 51. Now root 31(index 1) is
compared with its largest child 64 and are interchanged since 34 < 64 and is shown in initial heap.
34
64
8 64
51 34
51 32 21
8 32 21
51 34
32 34 32 21
8 21 64 8 51 64
Page 38
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
32 21
8 21 8 32
34 51 64 34 51 64
21 32
34 51 64
Page 39
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Algorithm 6.3.2
adjust(int x[10],int i, int n)
{
int item, j;
j=2 * i;
item = x[i];
while (j <=n)
{
if((j<n)&&(x[j]<x[j+1]))
j=j+1;
if(item>=x[j])
break;
x[j/2]=x[j];
j=2 * j;
}
x[j/2]=item;
return 0;
}
Searching
Searching is a process of locating a particular element present in a given set of elements. The
element may be a record, a table, or a file.
A search algorithm is an algorithm that accepts an argument ‘a’ and tries to find an element whose
value is ‘a’. It is possible that the search for a particular element in a set is unsuccessful if that
element does not exist. There are number of techniques available for searching. Linear Search
and Binary Search techniques are discussed in this session.
Linear Search
In Linear Search the list is searched sequentially and the position is returned if the key element to
be searched is available in the list, otherwise -1 is returned.. The search in Linear Search starts at
the beginning of an array and move to the end, testing for a match at each item. All the elements
preceding the search element are traversed before the search element is traversed. i.e. if the
element to be searched is in position 10, all elements form 1-9 are checked before 10.
Page 40
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
bool linear_search ( int *list, int size, int key, int* rec )
{
// Basic Linear search
bool found = false;
int i;
for ( i = 0; i < size; i++ )
{
if ( key == list[i] )
break;
}
if ( i < size )
{
found = true;
rec = &list[i];
}
return found;
}
The code searches for the element through a loop starting form 0 to n. The loop can terminate in
one of two ways. If the index variable i reach the end of the list, the loop condition fails. If the
current item in the list matches the key, the loop is terminated early with a break statement. Then
the algorithm tests the index variable to see if it is less than that size (thus the loop was terminated
early and the item was found), or not (and the item was not found).
Example 6.4
Assume the element 45 is searched from a sequence of sorted elements 12, 18, 25, 36, 45, 48,
50. The Linear search starts from the first element 12, since the value to be searched is not 12
(value 45), the next element 18 is compared and is also not 45, by this way all the elements before
45 are compared and when the index is 5, the element 45 is compared with the search value and
is equal, hence the element is found and the element position is 5.
12 18 25 36 45 48 50 2 18 <> 45 : false
12 18 25 36 45 48 50 3 25 <> 45 : false
12 18 25 36 45 48 50 4 36 <> 45 : false
12 18 25 36 45 48 50 5 45 = 45 : true
Page 41
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Binary Search
In a linear search the search is done over the entire list even if the element to be searched is not
available. Some of our improvements work to minimize the cost of traversing the whole data set,
but those improvements only cover up what is really a problem with the algorithm. By thinking of
the data in a different way, we can make speed improvements that are much better than anything
linear search can guarantee. Consider a list in sorted order. It would work to search from the
beginning until an item is found or the end is reached, but it makes more sense to remove as much
of the working data set as possible so that the item is found more quickly. If we started at the
middle of the list we could determine which half the item is in (because the list is sorted). This
effectively divides the working range in half with a single test. This in turn reduces the time
complexity.
Algorithm:
bool Binary_Search ( int *list, int size, int key, int* rec )
{
bool found = false;
int low = 0, high = size - 1;
while ( high >= low )
{
int mid = ( low + high ) / 2;
if ( key < list[mid] )
high = mid - 1;
else
if ( key > list[mid] )
low = mid + 1;
else
{
found = true;
rec = &list[mid];
break;
}
}
return found;
}
Page 42
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Example 6.5
Binary search is applied for data in example 6.4
12 18 25 36 45 48 50 5 6 5 45 = 45 : Found
Summary
Sorting is process of arranging elements either in ascending or descending order. This
facilitates the searching faster.
Bubble sorting is a sorting in which each element is compared with its adjacent
elements and largest value is moved to last.
Quick sorting is a sorting by partitioning. Instead of a single element a pair of elements
are arrange in one swap.
Heap sorting is a sorting by heaping the elements in a tree. It works with the same
complexity in all its worst, best and average cases.
In Linear search all the elements preceding the search element must be searched.
In Binary search the middle element is compared and either the left are right part is
only checked instead of all.
Page 43
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Answers
1. a
2. c
Page 44
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Session 8: Trees
Learning Objectives
After completing this chapter, you will be able to
Describe a tree
Explain how a tree can be represented internally
Describe how a tree can be traversed
Overview:
The data structures discussed in the previous sessions like Lists, stacks, and queues, are all linear
data structures. Tree is one of the several types of non-linear data structure.
A parent node of a particular node is the one which is in the higher hierarchy for a A node can
have exactly one parent i.e. a node can be attached to exactly one node in its higher hierarchy.
Example 8.1
B C D
E F G H
Page 45
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
The following table depicts some of the important terminologies related to a general tree structure.
Binary Tree
Binary tree is a finite set of nodes which either empty, or consist of a root and two disjoint binary
trees, called the left and right sub-trees. In other words it can be defined as a tree in which all the
nodes can have 2 as a maximum degree i.e. a node can have maximum two children.
Page 46
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Example 8.2
B C
D F G
Full Binary tree: A binary tree in which all its leaf nodes are in the same level is called a full binary
tree.
Example 8.3
B C
D E F G
Page 47
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Example 8.4
B C
D E
In a binary tree the maximum number of nodes at level i (level of the root node is 1) is equal to 2i-1
and the maximum number of nodes till level i is equal to 2i – 1
Example 8.5
In example 8.2
Number of nodes at level 2 is 22-1 = 2
Number of nodes at level 3 is 23-1 = 4
2
Maximum number of nodes till level 2 is 2 -1 = 3
Page 48
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Example 8.6
A A
B B
D D
Tree Representation
Array representation
The binary tree can be represented as we have discussed in the heap sort.
Since a binary-tree node never has more than two children, a node can be represented with 3
fields as one field for the data in the node in remaining two fields for two child pointers.
Many algorithms pertaining to tree structures usually involve a process in which each node of the
tree is “visited”, or processed, exactly once. Such a process is called a traversal.
Page 49
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Tree Traversals
A tree can be traversed in three different ways and are
Inorder traversal
Preorder traversal
Postorder traversal.
In all the traversal types the order of left and right sub tree are not changed i.e. always the left sub
tree is traversed before the right sub tree. The type of traversal is decided based on the position of
the data.
In preorder traversal the data is traversed before its sub trees are traversed.
In post order traversal the data is traversed after its sub trees are traversed.
In inorder traversal the data is traversed between its sub trees.
Page 50
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Example 8.7
B C
D E F G
I J
Inorder traversal
void inorder(struct btreenode *sr)
{
if(sr!=NULL)
{
inorder (sr->left);
printf(“%d\n”, sr->data);
inorder (sr ->right);
}
}
Page 51
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Preorder traversal
void preorder(struct btreenode *sr)
{
if(sr!=NULL)
{
printf(“d\n“, sr->data);
preorder(sr -> left);
preorder (sr ->right);
}
}
Postorder traversal
void postorder(struct btreenode *sr)
{
if(sr!=NULL)
{
postorder(sr -> left);
postorder (sr ->right);
printf(“d\n“, sr->data);
}
}
Page 52
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Example 8.8
63
47 71
6 54 67 84
79 91
Creation
The first element in the list is made as the root of the node. The elements following first are placed
in its left sub tree if they are less than the root and are placed in its right sub tree if they are greater
than the root. In other words we can state that creation is a combination of search and insertion
after the of root node.
Searching
The search is always carried from the root node, if the node to be searched is less than the root
value then the left sub tree is searched. If the search value is greater than the node value then the
right sub tree is searched. The search is continued till the search node is found or till the search is
ended without any branch to proceed.
Insertion
Steps involved in inserting a node are
Search for the node that has to be inserted (though it is not available) in the tree.
If the search ended at a node x insert the new node as its left child if the new node is
less than X, otherwise insert as its right child.
Page 53
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
63
47 71
6 54 67 84
15
79 91
Deletion
The node which has to deleted is first searched from the root to find its position. The deletion
operation is easier if the node which has to deleted is a leaf node. The link from its parent is
disconnected in order to delete that node.
If the non leaf node has a single sub tree then the child node is replaced in its place.
If the non leaf node has both left and right sub tree then either the in order successor or the
predecessor is replaced in its place.(i.e. the greatest left descendent or the smallest right
descendent)
Page 54
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
63
47 71
6 54 67 84
15
79 91
63 63
47 67 47 79
6 54 84 6 54 67 84
15 15
79 91 91
Advantage of a BST
Searching a node in a BST is faster, since either left or right sub tree is only searched from the
root till the node is found instead of comparing all the nodes preceding it.
Disadvantage of a BST
The tree may be a skewed binary tree if the elements are either in ascending(skewed left) or in
descending(skewed right) order, which lead to more levels.
Page 55
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Summary
Tree is collection of nodes arranged in hierarchical fashion
Binary tree is tree with 2 as its maximum degree
Tree can be represented either using an array or linked list
Tree can be traversed in 3 ways
Binary search tree is a binary tree in which a node can have all its left descendants as
less than that and right as greater than that.
Answers
1. c
2. d
Page 56
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Learning Objectives
After completing this chapter you will be able to
Define a balanced tree
Identify how a balanced tree can be constructed from a Binary tree
Define hashing
List the advantages and disadvantages of Hashing
Overview:
Balanced trees are classified into two categories
Height Balanced tree
Weight Balanced tree
AVL Tree
An AVL tree is a height balanced Binary Search Tree. The number of null branches is more in a
normal BST if the elements are almost in order, this leads to more levels and in turn need more
space. This problem is solved by balancing the height whenever a node is inserted into an AVL
tree. The re-balancing is recommended based on the balancing factor.
Balancing factor
Balancing factor of each node is calculated by finding the difference in levels between the left and
right sub tree.
Balancing factor of X = height of left sub tree of X - height of right sub tree of X
If the balancing factor of all the nodes in the tree is within the range of -1 and 1, then the tree is
already in balanced form, otherwise balancing is needed.
Page 57
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Page 58
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Page 59
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
NB. There will ever be at most one rotation required after an insert operation.
Example: 10.5: Constructing an AVL tree for the list of elements 50, 45, 30, 55, 63, 53
The upper part of the node represents the balancing factor and the lower part represents data.
LL rotation
Insert 50, 45, 30 Insert 55 Insert 63
-2
45
0 -2
-1 30 50
2
45
50
0 -1 -1
1
30 50 55
45
0
0
0 63
55
30
Page 60
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
-2 0
45 50
0 1 1 0
-1 30 55 45 55
45
0
0 0 -1 0 0 0
30
30 55 50 63 53 63
0 0 0
50 63 53
Now, as with the insertion algorithm, we traverse back up the path to the root node, checking the
balance of all nodes along the path. If we encounter an unbalanced node we perform an
appropriate rotation to balance the node.
NB. Unlike the insertion algorithm, more than one rotation may be required after a delete
operation, so in some cases we will have to continue back up the tree after a rotation.
When working with large sets of data, it is often not possible or desirable to maintain the entire
structure in primary storage (RAM). Instead, a relatively small portion of the data structure is
Page 61
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
maintained in primary storage, and additional data is read from secondary storage as needed.
Unfortunately, a magnetic disk, the most common form of secondary storage, is significantly
slower than random access memory (RAM). In fact, the system often spends more time in
retrieving data than actually processing data.
B-trees are weight balanced trees that are optimized for situations when part or the entire tree
must be maintained in secondary storage such as a magnetic disk. Since disk accesses are
expensive (time consuming) operations, a b-tree tries to minimize the number of disk accesses.
For example, a b-tree with a height of 2 and a branching factor of 1001 can store over one billion
keys but requires at most two disk accesses to search for any node
B-Trees
The Structure of B-Trees
Unlike a binary-tree, each node of a b-tree may have a variable number of keys and children. The
keys are stored in non-decreasing order. Each key has an associated child that is the root of a
subtree containing all nodes with keys less than or equal to the key but greater than the preceding
key. A node also has an additional rightmost child that is the root for a subtree containing all keys
greater than any keys in the node.
A b-tree has a minimum number of allowable children for each node known as the minimization
factor. If t is this minimization factor, every node must have at least t - 1 keys. Under certain
circumstances, the root node is allowed to violate this property by having fewer than t - 1 keys.
Every node may have at most 2t - 1 keys or, equivalently, 2t children.
Since each node tends to have a large branching factor (a large number of children), it is typically
necessary to traverse relatively few nodes before locating the desired key. If access to each node
requires a disk access, then a b-tree will minimize the number of disk accesses required. The
minimization factor is usually chosen so that the total size of each node corresponds to a multiple
of the block size of the underlying storage device. This choice simplifies and optimizes disk
access. Consequently, a b-tree is an ideal data structure for situations where all data cannot reside
in primary storage and accesses to secondary storage are comparatively expensive (or time
consuming).
Height of B-Trees
For n greater than or equal to one, the height of an n-key b-tree T of height h with a minimum
degree t greater than or equal to 2,
The worst case height is O(log n). Since the "branchiness" of a b-tree can be large compared to
many other balanced tree structures, the base of the logarithm tends to be large; therefore, the
number of nodes visited during a search tends to be smaller than required by other tree structures.
Although this does not affect the asymptotic worst case height, b-trees tend to have smaller
heights than other trees with the same asymptotic height.
Operations on B-Trees
The algorithms for the search, create, and insert operations are shown below. Note that these
algorithms are single pass; in other words, they do not traverse back up the tree. Since b-trees
Page 62
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
strive to minimize disk accesses and the nodes are usually stored on disk, this single-pass
approach will reduce the number of node visits and thus the number of disk accesses. Simpler
double-pass approaches that move back up the tree to fix violations are possible.
Since all nodes are assumed to be stored in secondary storage (disk) rather than primary storage
(memory), all references to a given node be preceded by a read operation denoted by Disk-Read.
Similarly, once a node is modified and it is no longer needed, it must be written out to secondary
storage with a write operation denoted by Disk-Write. The algorithms below assume that all nodes
referenced in parameters have already had a corresponding Disk-Read operation. New nodes are
created and assigned storage with the Allocate-Node call. The implementation details of the Disk-
Read, Disk-Write, and Allocate-Node functions are operating system and implementation
dependent.
B-Tree-Search(x, k)
i <- 1
while i <= n[x] and k > keyi[x]
do i <- i + 1
if i <= n[x] and k = keyi[x]
then return (x, i)
if leaf[x]
then return NIL
else Disk-Read(ci[x])
return B-Tree-Search(ci[x], k)
The search operation on a b-tree is analogous to a search on a binary tree. Instead of choosing
between a left and a right child as in a binary tree, a b-tree search must make an n-way choice.
The correct child is chosen by performing a linear search of the values in the node. After finding
the value greater than or equal to the desired value, the child pointer to the immediate left of that
value is followed. If all values are less than the desired value, the rightmost child pointer is
followed. Of course, the search can be terminated as soon as the desired node is found. Since the
running time of the search operation depends upon the height of the tree, B-Tree-Search is O(logt
n).
B-Tree-Create(T)
x <- Allocate-Node()
leaf[x] <- TRUE
n[x] <- 0
Disk-Write(x)
root[T] <- x
The B-Tree-Create operation creates an empty b-tree by allocating a new root node that has no
keys and is a leaf node. Only the root node is permitted to have these properties; all other nodes
must meet the criteria outlined previously. The B-Tree-Create operation runs in time O(1).
B-Tree-Split-Child(x, i, y)
z <- Allocate-Node()
leaf[z] <- leaf[y]
Page 63
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
n[z] <- t - 1
for j <- 1 to t - 1
do keyj[z] <- keyj+t[y]
if not leaf[y]
then for j <- 1 to t
do cj[z] <- cj+t[y]
n[y] <- t - 1
for j <- n[x] + 1 downto i + 1
do cj+1[x] <- cj[x]
ci+1 <- z
for j <- n[x] downto i
do keyj+1[x] <- keyj[x]
keyi[x] <- keyt[y]
n[x] <- n[x] + 1
Disk-Write(y)
Disk-Write(z)
Disk-Write(x)
If is node becomes "too full," it is necessary to perform a split operation. The split operation moves
th
the median key of node x into its parent y where x is the i child of y. A new node, z, is allocated,
and all keys in x right of the median key are moved to z. The keys left of the median key remain in
the original node x. The new node, z, becomes the child immediately to the right of the median key
that was moved to the parent y, and the original node, x, becomes the child immediately to the left
of the median key that was moved into the parent y.
The split operation transforms a full node with 2t - 1 key into two nodes with t - 1 key each. Note
that one key is moved into the parent node. The B-Tree-Split-Child algorithm will run in time O(t)
where t is constant.
B-Tree-Insert(T, k)
r <- root[T]
if n[r] = 2t - 1
then s <- Allocate-Node()
root[T] <- s
leaf[s] <- FALSE
n[s] <- 0
c1 <- r
B-Tree-Split-Child(s, 1, r)
B-Tree-Insert-Nonfull(s, k)
else B-Tree-Insert-Nonfull(r, k)
B-Tree-Insert-Nonfull(x, k)
i <- n[x]
if leaf[x]
then while i >= 1 and k < keyi[x]
do keyi+1[x] <- keyi[x]
Page 64
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
i <- i - 1
keyi+1[x] <- k
n[x] <- n[x] + 1
Disk-Write(x)
else while i >= and k < keyi[x]
do i <- i - 1
i <- i + 1
Disk-Read(ci[x])
if n[ci[x]] = 2t - 1
then B-Tree-Split-Child(x, i, ci[x])
if k > keyi[x]
then i <- i + 1
B-Tree-Insert-Nonfull(ci[x], k)
To perform an insertion on a b-tree, the appropriate node for the key must be located using an
algorithm similar to B-Tree-Search. Next, the key must be inserted into the node. If the node is not
full prior to the insertion, no special action is required; however, if the node is full, the node must
be split to make room for the new key. Since splitting the node results in moving one key to the
parent node, the parent node must not be full or another split operation is required. This process
may repeat all the way up to the root and may require splitting the root node. This approach
requires two passes. The first pass locates the node where the key should be inserted; the second
pass performs any required splits on the ancestor nodes.
Since each access to a node may correspond to a costly disk access, it is desirable to avoid the
second pass by ensuring that the parent node is never full. To accomplish this, the presented
algorithm splits any full nodes encountered while descending the tree. Although this approach may
result in unnecessary split operations, it guarantees that the parent never needs to be split and
eliminates the need for a second pass up the tree. Since a split runs in linear time, it has little
effect on the O(t logt n) running time of B-Tree-Insert.
Splitting the root node is handled as a special case since a new root must be created to contain
the median key of the old root. Observe that a b-tree will grow from the top.
B-Tree-Delete
Deletion of a key from a b-tree is possible; however, special care must be taken to ensure that the
properties of a b-tree are maintained. Several cases must be considered. If the deletion reduces
the number of keys in a node below the minimum degree of the tree, this violation must be
corrected by combining several nodes and possibly reducing the height of the tree. If the key has
children, the children must be rearranged.
Page 65
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
B-Tree Insertion
10 17 25 9 13 16 8 5 15 22
Underlined elements are newly added
10 10 17 17
10 25
17
10 17 10 17
9 10 25
9 13 25 9 13 16 25
10
10 17
8 17
8 9 13 25
5 9 13 16 25
Page 66
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
10
8 15 17
5 9 13 16 25
10
8 15 17
5 9 13 16 22 25
10
8 15 22
5 9 13 17 25
Page 67
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Hashing
Hashing is a technique which improvises the speed of search by calculating the address of the
search element directly using a mathematical formula instead of searching it.
Symbol Table
Symbol table is a dictionary of ADT used in a program. It is a set of names and attributes. The
characteristics of the name attributes vary depend upon its application.
Name : Identifier
Attribute : Initial value, list of lines using that id, etc
Hashing techniques are used to search, insert, and delete the items (name & attributes). Unlike
identifier comparisons to perform a search, hashing technique uses a formula called hash function
h(x).
Dynamic Hashing:
In dynamic hashing the identifiers are stored in a dynamic sized table called the hash table. The
table size can be altered in this hashing. The arithmetic function h(x) gives the address of x in the
table. The address is named as hash address or home address.
Overflow:
A new key k1 is mapped or hashed into a full table. If the mapping results in a table which is
already full, then it cannot be inserted into that table, this type of situation is called overflow.
Hash Collision:
When two different keys are resulting in same address after a hash function is termed as collision.
Suppose that two keys k1 and k2 are such that h(k1) equals h(k2). Then when a record with key
k1 is entered into the table, it is inserted at position h(k1). But when k2 is hashed, because its
hash key is the same as that of k2, an attempt may be made to insert the record into the same
position where the record with key k1 is stored. Cleary, two records cannot occupy the same
position. Such a situation is called a hash collision or hash clash. Hash collision can be avoided
through rehashing and double hashing.
Page 68
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Division
The key value is divided by a hash and modulo is taken as id address
f D(X) = X mod M
The function returns the bucket address 0 through M-1 and so the hash table is at least of size
b = M. If M is powers of 2 then h D (x) depends only on least significant bits LSB (x), since
programmer tendency is keeping variable with same suffix, it results in many collisions. If M is
divisible by 2, then Odd keys will be mapped to odd buckets and even keys to even buckets. This
causes the hash table biased and increase in collision. These difficulties can be avoided by
making M as prime hash, and then only the factors of M will be M and 1.
Folding
The key x is divided in to several parts and are added together to get the final result of hashing.
Two types of folding methods are available and are:
Shift folding
Folding at the boundaries
In shift folding the parts are simply added together.
Example: 74568392
74 + 56 + 83 + 92
305
Folding at the boundaries (Reverse Folding)
Parts in even position are reversed and then the values are added together.
Example: 74 + 56 + 83 + 92
74 + 65 + 83 + 29
=> 242
Page 69
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Digit Analysis:
This type of hashing is useful in the case of a static file, i.e. all the identifiers in a table are known
in advance. Each id x is interpreted as a hash with the radix r. The same radix is used for all id’s in
the table. Using this radix, the digits of each id are examined. Digits having most skewed
distribution are deleted. Enough digits are deleted, so that the hash of remaining digits is small
enough to give an address in the range of the hash table.
To Manage Overflow
The size can be doubled, but this is wasteful
Adding new page to the end and dividing the id at one between the original and new
page. But this will complicate the family of hash function
The new id is joined as an overflow and the new page is created at the end and the first page ids
get rehash. But sometimes no id from first will go to new page, this results in un-uniform hash
function. The pages (from 1 according to hash of new page this is if n new pages added then n
pages from 1 will be rehashed) to be rehashed and the new pages are addressed using 3 bits.
The pages with overflow are addressed with r+1 bits and the pages without overflow are retained
with r bits itself.
Summary
Balanced Tree is a tree in which the number levels are minimized by balancing the
height or weight.
AVL tree is a height balanced tree, balancing is done through four possible rotations.
B-tree is a weight balanced tree, balancing is done to maintain number of elements
and sub trees in each node.
Hashing is the process of calculating the address of the item using a mathematical
formula instead of searching.
Answers
1) a
2) d
Page 70
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Learning Objectives
After completing this chapter, you will be able to:
Represent the graph using array and Linked list
Traverse the graph
Calculate minimum cost spanning tree
Calculate the shortest route from source to all other nodes
Graphs
Introduction
Graph is a collection of nodes or vertices connected together through edges or arcs. Graphs are
used to model electrical circuits, chemical compounds, highway maps, and so on. They are also
used in the analysis of electrical circuits, finding the shortest route, project planning, linguistics,
genetics, social science, and so forth.
A graph can be shown pictorially. The vertices are drawn as circles, and a label inside the circle
represents the vertex. In an undirected graph, the edges are drawn using lines. In a directed
graph, the edges are drawn using arrows.
Page 71
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Example: 11.1
Undirected graph Directed graph
B B
A C E A C E
D D
(a) (b)
Let G be an undirected graph. Let u and v be two vertices in G. Then u and v are called adjacent if
there is an edge from one to the other; that is, (u, v) ⊆ E. Let e = (u, v) be an edge in G. We then
say that edge e is incident on the vertices u and v. An edge incident on a single vertex is called a
loop. If two edges, e1 and e2, are associated with the same pair of vertices, then e1 and e2 are
called parallel edges. A graph is called a simple graph if it has no loops and no parallel edges.
There is a path from u to v if there is a sequence of vertices u1, u2, ..., un such that u = u1, un = v,
and (ui, ui + 1) is an edge for all i = 1, 2, ..., n – 1.Vertices u and v are called connected if there is a
path from u to v.
A simple path is a path in which all the vertices, except possibly the first and last vertices, are
distinct. A cycle in G is a simple path in which the first and last vertices are the same. G is called
connected if there is a path from any vertex to any other vertex. A maximal subset of connected
vertices is called a component of G. Let G be a directed graph, and let u and v be two vertices in
G. If there is an edge from u to v, that is, (u, v) ⊆ E, then we say that u is adjacent to v and v is
adjacent from u. The definitions of the paths and cycles in G are similar to those for undirected
graphs. G is called strongly connected if any two vertices in G are connected.
Graph Representation
A graph can be represented in several ways. Two common ways: adjacency matrices and
adjacency lists.
Adjacency Matrix
Let G be a graph with n vertices, where n > 0. Let V(G) = {v1, v2, ..., vn}.The adjacency matrix AG
is a two dimensional matrix
n x n matrix such that the (i, j)th entry of AG is 1 if there is an edge from vi to vj; otherwise, the (i,
j)th entry is zero.
Example 11.2: Adjacency Matrix for graphs 11.1 (a) and (b)
A B C D E A B C D E
A 0 1 1 1 0 A 0 1 1 0 0
B 1 0 0 0 1 B 0 0 1 0 0
C 1 0 0 0 1 C 0 0 0 0 1
D 1 0 0 0 1 D 1 0 0 0 1
E 0 1 1 1 0 E 0 1 0 0 0
(a) (b)
Page 72
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Adjacency Lists
Let G be a graph with n vertices, where n > 0. Let V(G) = {v1, v2, ..., vn}. In the adjacency list
representation, corresponding to each vertex, v, there is a linked list such that each node of the
linked list contains the vertex u, such that (v, u) ⊆ E(G). Because there are n nodes, we use an
array, A, of size n, such that A[i] is a reference variable pointing to the first node of the linked list
containing the vertices to which vi is adjacent. Each node has two components, say vertex and
link. The component vertex contains the index of the vertex adjacent to vertex i.
A B C D
B A E
C A E
D A E
E B C D
Operations on Graphs
The operations commonly performed on a graph are as follows:
Create the graph. That is, store the graph in computer memory using a particular
graph representation.
Clear the graph. This operation makes the graph empty.
Determine whether the graph is empty.
Traverse the graph.
Print the graph.
How a graph is represented in computer memory depends on the specific application. For
illustration purposes, we use the adjacency list (linked list) representation of graphs. Therefore, for
each vertex, v, the vertices adjacent to v (in a directed graph, also called the immediate
successors) is stored in the linked list associated with v.
Page 73
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Graph Traversals
Processing a graph requires the ability to traverse the graph. Traversing a graph is similar to
traversing a binary tree, except that traversing a graph is a bit more complicated. Recall that a
binary tree has no cycles. Also, starting at the root node, we can traverse the entire tree. On the
other hand, a graph might have cycles and we might not be able to traverse the entire graph from
a single vertex (for example, if the graph is not connected). Therefore, we must keep track of the
vertices that have been visited. We must also traverse the graph from each vertex (that has not
been visited) of the graph. This ensures that the entire graph is traversed. The two most common
graph traversal algorithms are the depth first traversal and breadth first traversal, which are
described next. For simplicity, we assume that when a vertex is visited, its index is output.
Moreover, each vertex is visited only once. We use the bool array visited to keep track of the
visited vertices.
As in the case of the depth first traversal, because it might not be possible to traverse the entire
graph from a single vertex, the breadth first traversal also traverses the graph from each vertex
that is not visited. Starting at the first vertex, the graph is traversed as much as possible; we then
go to the next vertex that has not been visited. In other words it can be stated as all vertices that
are adjacent to the current vertex are traversed first. To implement the breadth first search
algorithm, we use a queue.
Page 74
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Example 11.4 The Depth first search for the above undirected graph in example 11.1
A, B, E, C, D
The Depth first search for the above undirected graph in example 11.1
A, B, C, D, E
Let G be a weighted graph. Let u and v be two vertices in G, and let P be a path in G from u to v.
The weight of the path P is the sum of the weights of all the edges on the path P, which is also
called the weight of v from u via P.
Let G be a weighted graph representing a highway structure. Suppose that the weight of an edge
represents the travel time. For example, to plan monthly business trips, a salesperson wants to
find the shortest path (that is, the path with the smallest weight) from her or his city to every other
city in the graph. Many such problems exist in which we want to find the shortest path from a given
vertex, called the source, to every other vertex in the graph. This section describes the shortest
path algorithm, also called the greedy algorithm, developed by Dijkstra.
Shortest Path
Given a vertex, say vertex (that is, a source), this section describes the shortest path algorithm.
The general algorithm is:
1. Initialize the array smallestWeight so that
smallestWeight[u] = weights[vertex, u].
2. Set smallestWeight[vertex] = 0.
3. Find the vertex, v, that is closest to vertex for which the shortest path has not been
determined.
4. Mark v as the (next) vertex for which the smallest weight is found.
Page 75
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
5. For each vertex w in G, such that the shortest path from vertex to w has not been
determined and an edge (v, w) exists, if the weight of the path to w via v is smaller than its
current weight, update the weight of w to the weight of v + the weight of the edge (v, w).
B
1 2
5
A D
2
1
C
SOURCE : A
Direct Cost
Select A-B
Page 76
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
In general, it is possible to construct multiple spanning trees for a graph, G. If a cost, cij, is
associated with each edge, eij = (vi,vj), then the minimum spanning tree is the set of edges, Espan,
forming a spanning tree, such that:
Kruskal's Algorithm
This algorithm creates a forest of trees. Initially the forest consists of n single node trees (and no
edges). At each step, we add one (the cheapest one) edge so that it joins two trees together. If it
were to form a cycle, it would simply link two nodes that were already part of a single connected
tree, so that this edge would not be needed.
Every step joins two trees in the forest together, so that, at the end, only one tree will remain in T.
Page 77
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
4 B
1
1 2
A C E
3
2
D
A C
Page 78
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
B
1
A C E
B 1
A C E
2
D
Page 79
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
B 1
1 2
A C D
2
E
Summary
Graph is a collection of nodes Connected together using edges.
Graph can be traversed using DFS or BFS
Shortest path for a vertex with other vertices can be calculated using Dijkstra’s
algorithm.
Spanning tree is an acyclic graph.
Minimum cost spanning trees can be derived using Kruskal’s algorithm.
Answers
1. b
2. b
Page 80
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Glossary
Abstract data type A formal, language-independent description of data elements, the
relationships among then, and the operations that act upon them.
Acyclic graph A graph without any cycles.
Adjacency list A method using linked list to represent a the edges of graph or
network.
Adjacency matrix A method that uses a matrix to represent the edges of a graph or
network.
Adjacent Two nodes of a graph are adjacent if they are connected by an
edge.
Arc The edge of a graph that establishes a directional orientation
between its end point.
AVL Tree A tree in which, for each node, the difference between the height
of its left sub tree and the height of its right sub tree is at most
one.
Balanced Factor For a node in a binary tree, the difference between the height of
its left sub tree and the height of its right sub tree.
Big-O analysis A technique in which the time and the space requirements of an
algorithm are estimated in order of magnitude terms,
Binary search The process examining a middle value of a sorted array to see
which half contains the value in question and continuing to halve
until the value is located.
Binary search tree A Binary tree with the ordering property.
Binary tree A tree in which each node has at most two sub trees.
Breadth first search A visiting of all nodes in a graph, it proceeds from each node by
first visiting all nodes adjacent to that node.
B-tree An efficient, flexible index structure often used in DBMS on
random access files.
Bubble sort Rearranges elements of an array until they are in either ascending
or descending order.
Bucket In bucket hashing contiguous region of storage locations.
Children Nodes pointed to by an element in a tree.
Circular linked list A linked list in which the lost node of the list points to the first
node in the list.
Clustering Occurs when a collision resolution strategy causes keys that have
a collision at an initial hashing position to be relocated to the
same region within the storage space.
Collision Condition in which more than one key hashes to the same
position with a given hashing function.
Cycle A path of a graph which originates and terminates at the same
node.
Page 81
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Page 82
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
Level All nodes in a tree whose path are the same length from the root
node.
Linear search The process of examining the first element in a list and
proceeding to examine the elements in the order until a match is
found.
Link A pointer from one node to another.
Linked List Collection of nodes connected through a pointer in a linear
fashion. Each node is divided into two parts as data and link.
Minimum spanning tree A collection of edges connecting all of the nodes of a graph
without any cycle.
Node A structure storing a data item in a linked list, tree, graph….
Out degree of node A number of directed edges that originate at the node.
Overflow In linked collision processing the area in which keys that cause
collisions are placed.
Parent In a tree the node that is pointing to its children.
Partition In quick sort the process of moving the pivot to the location where
it belongs in the sorted array and arranging the remaining data
items to the left of pivot if they are less than or equal to the pivot
and to the right if they are greater than or equal to the pivot.
Path A sequence of edges that connects two nodes in a graph.
Pivot Item used to direct the partitioning of quick sort.
Pointer A memory location containing the location of another data item.
Post order traversal A binary tree traversal in which the nodes are traversed in the
order of left sub tree, right sub tree and the node.
Preorder traversal A binary tree traversal in which the nodes are traversed in the
order of node, left sub tree and the right sub tree.
Priority queue A queue in which the deletion is done on priority.
Queue A data structure in which the elements are added at one end and
removed form the other end. Referred to as a FIFO.
Quick sort Relatively fast sorting technique that uses recursion and a
partition algorithms.
Rehashing Method of handling a collision in which a sequence of new
hashing function is applied to the key that caused the collision
until an available location for that the key is found.
Sibling Children of a same node.
Stack A data structure in which the elements are accessed from one
end. Referred to as a LIFO.
Tree A collection of nodes arranged in a hierarchical fashion.
Weight A numeric value associated with an edge in a graph.
Weight balancing Maintaining the number elements that can be handled in a single
node of trees.
Page 83
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
References
Websites
http://www.macs.hw.ac.uk/~alison/ds98/ds98.html
http://www.cs.auckland.ac.nz/software/AlgAnim/lists.html
http://students.washington.edu/mukundn/courses/cse490b/
http://en.wikipedia.org/wiki/Data_structure
http://www.cs.indiana.edu/classes/a202-sbog/notes/BigOh.html
http://www.personal.psu.edu/faculty/j/h/jhm/f90/lectures/18.html
http://cslibrary.stanford.edu/103/
http://ocw.mit.edu/NR/rdonlyres/Civil-and-Environmental-Engineering/1-00Spring-
2005/9EBF826C-7CC3-40C8-8FA6-FF579460CC3E/0/sptutorial10.pdf
http://www.csc.liv.ac.uk/~frans/COMP101/AdditionalStuff/moreRecords.html
http://www.cse.cuhk.edu.hk/~csc2100a/lecture/sort1.pdf
http://www.cs.sunysb.edu/~skiena/214/lectures/lect16/lect16.html
http://www.iimb.ernet.in
www.ncsi.iisc.ernet.in
www.highered.mcgraw-hill.com
www.tech.purdue.edu
www.indianmba.com
www.iimb.ernet.in
http://wps.prenhall.com
Books
“Fundamentals of Data Structures”, Ellis Horowitz, Sartaj Sahni, Glagotia Book
Source, Computer Science Press Inc 1983
“An Introduction to Data Structures with applications” Jean-Paul Tremblay, Paul G
Sorenson, II Edition, Tata McGraw-Hill Edition
“Introduction to Data Structures”, Bhagat Singh, Thomas L Naps, Glagotia Book
Source
“Data Structures using C and C++”, Yedidyah Langson, Moshe J Augenstein, Aaron M
Tenenbaum, Pearson Education Asia
“Introduction to Data Structures and Algorithms Analysis”, Thomas L Naps, Second
Edition, WEST publishing company, US
Page 84
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected
Data Structures with C
STUDENT NOTES:
Page 85
©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved
C3: Protected