You are on page 1of 15


Self balancing binary trees

a self-balancing (or height-balanced) binary search tree is any nodebased binary search tree that automatically keeps its height (maximal number
of levels below the root) small in the face of arbitrary item insertions and
Since we know that on binary search trees (BST), most of the operations take
time in accordance with the height of the tree, hence we strive to keep its
height minimal with respect to the number of elements entered. And
operations hence take less time to perform. So we dont have to worry about
the performance overhead in search, however insertion takes more time,
since we know that insertion is less performed operation than search, so this
approach is acceptable in such a scenario.
Avl tree.

AVL tree is a self-balancing binary seach tree [BST], which was introduced
in1962 in a research paper (An algorithm for the organization of
information) by Soviet mathematicians, namely:
Georgy Adelson-Velsky
Evgenii Mikhailovich Landis
Time complexity

Worst case time complexity for search, delete and insert is O (log n).
where n = the total number of nodes in the tree.
However rebalancing after insertion and deletion may take some extra time
which has not been accounted for in this time complexity analysis.
Total time taken for adjustment and rotation is O(1).

Working of AVL tree:

AVL tree is used in applications in which search is the most performed

operation. after each insertion or deletion, it is checked from the current place
of insertion to the root. if the braches are sagging more than one level from
the level of other branches, then the rotation is done.
Balance factor:

Balance factor of an AVL tree is calculated by following formula at each node:

Height of left subtree Height of right subtree
If the balancing factor is less than -2 or more than 2 then rotation is done in
case of mildly balanced tree in order to reduce the imbalance in the tree,
however in strictly balanced trees, height of both subtrees should be equal.

Rotation mechanism
Left rotation

Consider the tree shown in the diagram on the right,

we can see that it is unbalanced on node 4 according
to the rule of balance factor, what we will have to
rotate the neighboring nodes so balance could be
achieved, and AVL tree will remain in its healthy
state, and does not degenerate into an unbalanced

Now, since we know that the AVL tree still work

on the principle that:
left child < parent < right child
so after the rotation, we can see (in the picture on
the right) that node 5 has taken the place of parent

instead of node 4 which has moved to the status of left child. And node 6 has
been moved to the place of right child.
Right rotation:

Now as indicated in the picture that an element is

added below the leaf X, so we know that the rotation is
compulsory since the tree will be unbalanced. hence to
balance the tree, we have to move node m to the right
hand side f the node n, such that node Z will become
the child of node m. and since node X is already at
height+1, it will be appended directly below the root,
so as to balance the height on both left and right sub trees.

Node Y has remained a child of node m. In the scheme of

rotation, we can see that it has been strived to preserve the
parent child relation, in order to simplify the process.

Advantages of using AVL tree:

1. Time complexity for search is O(log N) since AVL trees are balanced at
all times.
2. Operations of insertion and deletion also have a time complexity of
O(log n)
3. Speed of insertion is not so much impeded by the balancing operation
since it only adds a constant time factor to the operation.
Disadvantages of using AVL tree:
1. AVL tree is difficult to realize in form of code; more space is required for
balancing factor.
2. AVL tree serves to enable faster operations but rebalancing is a costly
operation in terms of time.

Donald Knuth. The Art of Computer Programming, Volume 3: Sorting and
Searching, Second Edition. Addison-Wesley, 1998.
Wikipedia article: Self balancing binary search tree
Wikipedia article: AVL tree
Lectures from University of Washington website
Lectures from University of Wisconsin website

Java Collection Framework

What is a collection?
Collection is an object that combines/collects multiple elements into a single unit.

What is a collections framework?

A collections framework is an architecture which is a combination of different
programming ingredients made for the sole purpose of using collections,
representing them and manipulating them.
A collection framework comprises of these three things:
1. Algorithms: Algorithms are the step by step approach to achieve the
purpose of representing data as we see in a structure theoretically.
2. Implementations: implementations are simply the implementation of the
algorithms which enable the user to use the collection in their own way to
satisfy the needs of a program.
3. Interfaces: Interfaces are the abstract classes through which we can access
the implementations of the algorithms.
Collections can be used to store, access, manipulate, and communicate aggregate
data or perform different operations on it. Collections represent data items that
belong to a similar group of elements, like:
A dictionary (collection of valid words in a given language)
A deck of cards (collection of valid and playable cards representing a card
system) etc.

Advantages of using JCF (Java Collection Framework)

Following some of the advantages that are offered by the used of JCF:

Reduction in effort required in programming: by providing ready made

and reusable structures to use in the programs, so the programmer does not
have to write those algorithms him/herself.
Increase in performance: by adding in productivity via better
implementations of data structures that are to be used, also that various
interfaces offer different implementations to suit the need of the specific
Provision of compatibility and interoperability between unrelated APIs:
because using java as a common language these APIs can pass the data
amongst themselves.
Reduces the learning cost: by requiring us to learn different collection
Reduces the effort required to design and implement APIs: by not
requiring you to produce ad hoc collections APIs.
Encourages collection reuse: via implementing different algorithms, and
interfaces with which to use and manipulate them to owns benefit.

Architecture and interfaces:

Following are some core collection interfaces:
Collections are the root structure of collection framework. It represents a
combination of objects known as its members. Some types of collections also
allow similar elements in a set, while others dont. ordered and unordered are
another variety present in the realm of collections. Java platform doesnt have
any implementations as it is for the collections, but set and list are the
subinterfaces that apply the thought behind the collections. Add, clear, clearAll,
retainAll are some of individual and bulk operations that the collection class
Set is a collection of unique elements i.e. it cannot contain duplicate elements.
This interface is modeled after the representation of sets as in mathematics.
ContainsAll, removeAll, retainAll, addAll are some of the bulk operations that can
be performed on sets.

List is an ordered set of elements. It can have similar elements more than once,
i.e. it can have or can not have unique elements. Every element has an index
number and members can be manipulated via these index numbers. Sort, Shuffle,
rotate, binarySearch, indexOfSublist are the examples of the operations that can
be performed on the lists.
This collection is very common. Most common use for this data structure is to
hold elements before they are to be entered in for processing. A queue comprises
of additional methods which are insertion, extraction etc. queues implement FIFO
(first in first out) sequence of the element entry and removal of elements, but this
not a hard and fast rule. Priority queues are the example of such queues which
defy FIFO structure. Insert, remove and examine are some of the examples of
operations that can be performed on the queues.
Dequeues are double ended queues, they can be operated on from both ends of
the pipeline, they implement both FIFO (Fist in first out) and LIFO (Last in first
out). addFirst, addLast, peekFirst, peekLast are some of the examples of the
operations that can be performed on Dequeues.
Map is basically an object which points to values through keys. Duplicate keys
cannot be entered because that would lead to confusing pointing and retrieval of
information. Java implements maps in three basic forms i.e. Treemaps,
linkedHashMap, HashMaps.
Sorted Set
Sorted sets are different from sets in manner that its elements are arranged in
sorted way. Advantage of sorting is taken using different new operations that are
not present in the Set interface. These find application in dictionaries, roll
numbers of students and other different areas of applications.

Sorted Map
Sorted maps are different from the Maps in the way that their mappings keys are
maintained in ascending order. Otherwise they are same.
Legacy classes:
Legacy classes have been retrofitted into the collections framework, they can be
used but they are not as efficient as newer data structures. however, the list of
present classes have been mentioned below.
Hashtable: newer alternative would be Hashmap.
Enumeration: collection and iterators should be used instead, but
Collection.Enumeration library can be used if needed.
Vectors: Arraylist can be used to replace this data structure.
Stack: linkedlists can be used in place of stacks.
Bitset: Arraylist of Boolean type variables can be used to replace the BitSet Data
structure, but obvious disadvantage would be wastage of space, because one bit
would take 8 bits to represent an array.

Object Oriented Software Construction (Lecture notes) University of Auckland
Chapter no. 22 Java collections framework
Liang, Introduction to Java programming, Seventh edition, Pearson

Java collections, software engineering notes, University of Washington
Wikipedia article: Java collections framework
Tutorial: Introduction to Java collections (Sun corporation)
Java collections main page
Java collections framework: overview
JCF Lesson: Interfaces (The Java collections > Collections)

Shortest path algorithm

Graph theory and directed graphs:
In order to understand the shortest path algorithms we will have to understand what
digraphs are. Graph theory is a branch of mathematics, more specifically of discrete
mathematics. Basically the shortest paths using different algorithms are applied to
digraphs which are a part of the graph theory.
A directed graph / digraph is a set of nodes connected by edges / paths which either have
or dont have a direction associated with them.
Graphs are expressed in the notation G=(V,A) where:
V denotes the nodes or in other words, vertices.
A is a set of ordered pairs of vertices which are also called edges in a graph, they can
be directed or non-directed.

Analogy of digraphs to road maps

An analogy can be given from the map of a city where the shortest path between various
destinations can be found out by traversing the path between the destinations. Nodes of the
graphs correspond to the intersections on the road map, and edges are road segments,
weights can be assigned to the segments according to the length of the road segments.

Shortest path problem:

Shortest path problems are defined in Wikipedia as:
In graph theory, the shortest path problem is the problem of finding a path
between two vertices (or nodes) in a graph such that the sum of the weights of its
constituent edges is minimized.

Categorization of Shortest path problems

Shortest path algorithms can be categorized as follows:

Single pair shortest path problem

single source shortest path problem
single destination shortest path problem
all pairs shortest path problem

Shortest path algorithms:

Following are different shortest path algorithms which can be used in different scenarios
and settings, their efficiency can be different based on the type of problem given:

Dijkstra's algorithm
BellmanFord algorithm
A* search algorithm
FloydWarshall algorithm
Johnson's algorithm
Viterbi algorithm

Some of the algorithms will be explained here briefly:

Floyd-Warshalls shortest path algorithm:

The FloydWarshall algorithm was published by Robert Floyd in 1962. However, it is
similar to the algorithms previously published by Bernard Roy in 1959.
FloydWarshall algorithm is also known as Floyd's algorithm, RoyWarshall algorithm,
RoyFloyd algorithm. it is a graph analysis algorithm for finding the shortest path in a
weighted digraph with positive or negative edge weights, but comprising of negative cycles.
A single execution will find the lengths of the shortest paths between all paired nodes. Time
complexity in the worst case scenario is O (|V|3)

1. it compares all possible paths through the graph between each paired node.
2. It does this with the help of (|V |3) comparisons in a graph.
3. there may be up to (|V |2) edges in the graph, and every combination of edges is
4. It does so by step by step improving the estimate on the shortest path between two
5. until the estimate is optimum the loop is continued.

Dijkstras shortest path algorithm:

Dijkstra's algorithm was published by Edsger Dijkstra in 1959.
It is a graph search algorithm that solves the single-source shortest path problem. It is used
in graphs that cannot use negative edge cost. It produces a shortest path tree. This
algorithm is often used in routing along with other graph algorithms.

A source node is given in the graph in which the shortest path is to be found, the algorithm
finds the path with lowest edge cost which obviously is the shortest path from that node to
every other node in the graph (single source - multiple destination problem). It can also be
used to find the shortest path in single source single destination problems by stopping
the algorithm once the algorithm has reached the destination and shortest path is achieved.
Same road map analogy can be applied here. For example, if we want to find the shortest
distance from our current location to all other destinations or road intersections in the city,
then Dijkstras algorithm can be used, or in the second case as we have discussed, we will
have to stop midway in order to find the shortest path from current location to a specific
location. But the algorithm will also find the shortest paths for all the destinations in
between of that specific path.

Source node is called initial node. Let the distance of node Y be the distance from the initial
node to Y. Following steps will be followed to find the shortest path from initial node to
node Y.
1. An initial value is set to all nodes. This is 0 for initial node and infinity for all other
2. All nodes are marked unvisited. Initial node is set as current. A set called unvisited
set is created comprising of all the unvisited nodes.
3. Consider all of its unvisited neighbors of the current node and calculate their
tentative distances. Compare the newly calculated distance to the current assigned
value and assign the smaller one, if the previous value was greater.
4. When the distance is measured for all the neighbors from the current node, set the
current node as visited and eliminate it from the unvisited set.
5. If the destination node is set to visited (in single source single destination
problem) or if the smallest tentative distance from current node to the nodes in
unvisited set is infinity (in single source multiple destination proble, when the
unvisited nodes are disjointed from the current node) then the algorithm has
6. Select the unvisited node that is marked with the smallest tentative distance, and set
it as the new "current node" then go back to step 3.

Comparison between Floyd-Warshall and Dijkstras

Floyd-Warshall Algorithm:
1. Finds a shortest-path for all node-pairs (x, y).
2. We can have one or more links of negative cost, c(x, y)<0,but

3. no cycle of negative cost. (Assume that c(xi,x)=0for each

4. node xi, which is the same as not having the links (x).)
5. Complexity: O(N3), where N = number of nodes in the digraph.

Dijkstras shortest path Algorithm:

1. Finds shortest path from a given startNode to all other nodes reachable from it in a
2. Assumes that each link cost c(x, y) = 0.
3. Complexity: O(N2), N = number of nodes in the digraph

Applications of Shortest path algorithms:

Maps and navigation

Shortest-paths is a broadly useful problem-solving model
Robot navigation.
Texture mapping.
Urban traffic planning.
Optimal pipelining of VLSI chip.
Subroutine in advanced algorithms.
Telemarketer operator scheduling.
Routing of telecommunications messages.
Network routing protocols (OSPF, BGP, RIP)

Schaums outlines: Discrete mathematics.
3rd edition by Seymor Lipschutz & Marc Lipson
Shortest path problem (Wikipedia article).
Princeton University, article on the introduction of Dijkstras algorithm.
Louisiana State University, article on Floyd-Warshalls algorithm.
Floyd-Warshalls shortest path algorithm (Wikipedia Article)
Dijkstras Algorithm (Wikipedia article)