CS 146: Data Structures and Algorithms Slides

1
Introduc+on
Data Structures and Algorithms

Angus Yeung, Ph.D.
Agenda for Lecture 1

Book-keeping
WEEK #1
LECTURE #1
About the Instructor

Assignments, Grading Policy, Add Code, etc.
Introduc+on
Why study Data Structures and Algorithms
Working with large data set
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.
About Dr. Yeung

Senior SoMware Manager, Intel Corp.
Manage teams to develop Android and iOS mobile
soMware for Wearables Devices
PHD,MS,BS in Electrical & Computer Engineering
Berkeley MBA
Live with wife and 3 boys in Palo Alto
Assessment
Class Prole
Sophomore, Junior, Senior, Open University, etc.
What do you know about CS146?

What do you want to get out of this class?
What is your expecta+on on your class instructor?
Have you taken CS146 before?
Class Climate
Work really hard this is a challenging class!
A\end each class, because
Not everything I taught will be available on Canvas
I will go over some selected problem sets in the class
Those students skipped my last class all failed the class
No cellphone in the class

Dont sleep in the class
No web browsing in the class
Assignments
Assignment will include both wri\en and
programming problem sets
Do your own homework assignment as the
concept will be tested in quiz/exam.
All homework assignments will be submi\ed
electronically on Canvas
As such, no late or make-up assignments will be
graded.
Adding to this Class

The class is full but I may be able to add very few
students to this class.
I will check a couple of things before I pass out an add
code to a student:
Pre-requisites?
A\endance in the rst few lectures?
Senior or gradua+ng students?
Repea+ng CS146?
However, I wont pass any add codes in the rst

week.
Gegng Recommenda+on
You must earn an A- or be\er grade in this class if
you want to ask me for a recommenda+on le\er.
Dont add me to your Facebook; but I can consider
your invita+on to connect in LinkedIn aMer
knowing you for some +me in this class.
Okay to ask me ques+ons about hi-tech industry.
Textbook
We will use Data Structures and Algorithm
Analysis in JAVA by Mark Weiss in this class.
Slide Deck
PowerPoint Slides for each lecture will follow very
closely to the textbook.
Highlight
Comment
Textbook
Sec+on
Math
10
Grading Policy
The percentage weight assigned to class assignments,
group project and nal exam are listed as below:
11
Grading Policy
Grades will be assigned as described below. This scale may be adjusted
once the nal exam has been graded to provide a le\er grade
distribu+on that matches the expected average for this class.
12
Course Schedule
13
Course Schedule
14
Course Structure
Founda'on
Reinforcement
A\end lectures
Read book chapters
Wri\en class assignments

Java Programming assignments
Quiz aMer each assignment
Integra'on
Review course material

Study for exams
Introduc+on to CS146
Algorithm Alnalysis
Assignment 1
List Stacks and Queues
Assignment 2
Trees
Hashing
Assignment 3
Priority Queues
Mid-Term
Sor+ng
The Disjoint Set Class
Assignment 4
Graph Algorithms
Final Exam
15
Purpose of this Course

This course addresses two important aspects of
computer science:
Data Structure
Methods of organizing large amounts of data
Algorithm Analysis
Es+ma+on of the running +me of algorithms
16
1.0
Selec+on Problem
Determine the kth largest of a group of N numbers
Algorithm 1:
Read the N numbers into an array

Sort the array in decreasing order by some algorithm
Return the element into posi+on K
Algorithm 2:
Read the rst K elements into an array

Sort them in decreasing order
Each remaining element is read one by one
Ignore the element if smaller than the kth element

Otherwise the element is placed in the array, bumping one element out
Which algorithm is be\er?

Is either algorithm good enough?
17
1.1
Word Puzzle Problem

With a 2d array of le\ers and a list of words, nd the words
in the puzzle
Algorithm 1:
Check each ordered triple (row, column, orienta+on)

Use lots of nested for loops to nd matches
Algorithm 2:
Check each ordered quadruple (row, col, orienta+on,

character#) that doesnt run o an end of the puzzle
Use lots of nested for loops to nd matches
Are both alogorithms prac+cal enough if the word list is a

dic+onary?
Is it possible, even with a large word list, to solve the
problem in a ma\er of seconds?
18
1.1
Working with Large Data Set

Wri+ng a working program is not good enough if
the program is to be run on a large data set
Running +me becomes an issue:
How to es+mate the running +me of a program for
large inputs?
How to compare the running +mes of two programs
without actually coding them?
How can we determine program bo\lenecks?
How do we use op+miza+on techniques to improve
the speed of a program?
19
1.1
Book-Keeping
Mathma+cs Review
Exponents, Logarithms
Series, Harmonic Number, Euler's Constant
Modular Arithme+c
Proof by Induc+on
Recursive Func+on
Generic Programming (not using Generics)
WEEK #1
LECTURE #2
Using Object for Genericity

Wrappers for Primi+ve Types
Using Interface Types for Genericity
Compa+ility of Array Types
Generic Programming (Using Generics)

Simple Generic Classes and Interfaces
Autoboxing / Unboxing
The Diamond Operator, Wildcards with Bounds
Generic Sta+c Methods, Type Bounds
20
Stay tuned with Canvas

Announcements
New CS146 classes
Reading List
File Uploaded:
Slide deck for Chapter 1
Green Sheet
Upcoming:
Assignment 1 will be posted
Revised slide deck for Chapter 1
21
New CS146 Class Added

New CS146 Classes Added:
A new CS146 sec+on taught by Girish on TT 7:30-8:45 AM for Spring 2015.
Another CS146 class TT noon-1:55 PM taught by Evan in this summer.
Already enrolled in my CS146 class:

Do nothing
Consider other sec+ons if working out be\er for you
Know anyone who is s+ll trying to add CS146?

Let the person know about this new CS146 sec+on.
Wai+ng List for Sec+on 4 and 7:

34 students on my wai+ng list for Sec+on 4 and 7.
14 students are gradua+ng seniors or seniors

22
Readings for Lecture #1 & #2

Readings for Lecture #1:
Sec+on 1.1 What's the Book About
Readings for Lecture #2:

Sec+on 1.2 Mathema+cs Review
Sec+on 1.3 A Brief Introduc+on to Recursion
Sec+on 1.4 Implemen+ng Generic Components Pre-Java 5
Sec+on 1.5 Implemen+ng Generic Components Using Java 5
Generics (Exclude 1.5.7 and 1.5.8)
23
Mathema+cs Review
Exponents
Logarithms
24
1.2
Mathema+cs Review
Series
Geometric Series:
Harmonic Number:
Eulers Constant:
Arithme+c Series:
H1
H2
H3
H4
H5
H6
H7
H8
H9
25
ln
ln
ln
ln
ln
ln
ln
ln
ln
1
2
3
4
5
6
7
8
9
=
=
=
=
=
=
=
=
=
1,
0.806852...,
0.734721...,
0.697038...,
0.673895...,
0.658240...,
0.646946...,
0.638415...,
0.631743...,
1.2
Mathema+cs Review
Modular Arithme'c
Example:
If N is a prime number:
is true if and only if
or
has a unique solu+on

has either two solu+ons
or no solu+ons
26
1.2
Mathema+cs Review
Proof by Induc'on
Prove that: (Fibonacci numbers)
F0 = 1, F1 = 1, F2 = 2, F3 = 3, F4 =5,...,Fi = Fi1+Fi2,
sa+sfy Fi <(5/3)i,for i 1.
Base Case: F1 = 1 < 5/3 and F2 = 2 < 25/9
Induc+ve Hypothesis: We assume that the theorem is true

for i = 1, 2, . . . , k;
Proof: By deni+on: Fk+1 = Fk + Fk1
Use the induc+ve hypothesis on the right-hand side:

Fk+1 < (5/3)k + (5/3)k1
<
<
<
<
<
(3/5)(5/3)k+1 + (3/5)2(5/3)k+1
(3/5)(5/3)k+1 + (9/25)(5/3)k+1
(3/5 + 9/25)(5/3)k+1
(24/25)(5/3)k+1
(5/3)k+1
27
1.2
Mathema+cs Review
Proof by Induc'on
Prove that:
Base Case: the theorem is true when N = 1
Induc+ve Hypothesis: We assume that the theorem is true for 1 k N;
Proof:

We have:
Use the induc+ve hypothesis on the right-hand side:
28
1.2
Mathema+cs Review
Proof by Counterexample
Prove that the statement Fk k2 is false
The easiest way to prove this is to compute F11 = 144 > 112.
Proof by Contradic'on
Prove that there is an innite number of primes.
1. Assume that the theorem is false.
So there is some large prime Pk.
2. Show that this assump+on implies that some known property is false.
Let P1,P2,,Pk be all the primes in order and consider N=P1P2P3Pk+1

Clearly, N is larger than Pk, so by assump+on N is not prime.
However, none of P1,P2,,Pk divides N exactly, because there will be a remainder of 1.
This is a contrac+on: because every number is either prime or a product of primes.
3. Hence the original assump+on was erroneous.

This implies that the theorem is true.

29
1.2
Recursive Func+on
Consider the following func+on:
f(0) = 0 and
f(x) = 2f(x-1)+x2
How to implement it in Java?

Java allows func+ons to be recursive
What a bad idea! For illustra+ve purpose only
Fundamental rules of recursion:
Base Case
1. Must establish some base cases

Consider that to set up exit condi+ons
2. Must make progress toward a base case
For example, f(-1) will not converge
3. Must work with all the recursive calls
So we dont need to mind the details of the book-
keeping arrangements.
4. Must never duplicate work by working the same instance
of a problem in separate recursive calls
So-called compound interest rule
30
Recursive Call
1.3
Generic Implementa+on
Generic mechanism promotes code reuse an
important aspect of object-oriented programming
Java didnt support generic implementa+on
directly un+l Java 5
Pre-Java 5: generic methods and classes can be
implemented in Java using the principle of
inheritance
Use Object for Genericity
Use Interface Types for Genericity
31
1.4

We can use an appropriate superclass, i.e., Java
Object class as the generic class
All class objects are inherited from Java Object
class explicitly or implicitly.
To access a specic method of the object, we must
This excludes Primi+ve Types:
downcast to the correct type
Using Object as a generic type
works only if the opera+ons that
are being performed can be
expressed using only methods
available in the Object class.
32
1.4

Some useful Object methods:
String toString()
boolean equals(Object Obj)
Object clone()
int hashCode()
Return an Object here
Java invokes the toString

method here.
33
1.4

Java provides a wrapper class for each of the eight
primi+ve types (not compa+ble with Object)
For example, the wrapper class for the int type
is Integer
Each wrapper object is immutable (meaning its
state can never change aMer the object is
constructed)
Like other Java classes, the Object class is the
super class for each wrapper class
34
1.4
Since Integer is immutable,

value (Primi+ve Type) is
assigned via constructor.
intValue() is a
method of
Integer
35
1.4
Using Interface Types

5
May not work if a class cant

implement needed interface
(e.g., library class, nal class)
Primi+ves cannot be 3
passed as Comparables
but the wrappers can
Covariant Data Type : be
careful with it!
Must implement the
Comparable interface
Objects must be
compa+ble, e.g. same
super class Shape
Invoke the concrete

compareTo method
It is not required that the

interface be a standard
library interface.
36
1.4
Compa+bility of Array Types

Java Arrays are type-compa+ble, known as a covariant array type.
For example: Covariance of arrays is needed so Line 29 & 30 can compile.
But this may cause type confusion some+mes.

Assume that Employee IS-A Person and Student IS-A
Person
Compiles: arrays are
compa+ble.
Compiles: Student IS-A

Person.
We have a type confusion because Student IS-NOT-A Employee

and No ClassCastException is thrown
37
1.4
Generic Classes & Interfaces

The class or interface declara+ons include type parameters
enclosed in angle brackets <> aMer the class or interface name.
Type Parameter
Generic Types
available in Java 5
and higher
Type checking will occur

during compile-+me rather
than run+me
38
1.51
More on Generic Types
A generic type has one or more type variables

Type variables are instan'ated with class or interface types
Cannot use primi+ve types, e.g. no ArrayList<int>
When dening generic classes, use type variables in deni+on:
public class ArrayList<E>
{
public E get(int i) { . . . }
public E set(int i, E newValue) { . . . }
. . .
private E[] elementData;
}
Generic classes are converted by the compiler to nongeneric classes by a

process known as type erasure.
The benet is that the programmer does not have to place casts in the
code, and the compiler will do signicant type checking.
39
Generic Classes & Interfaces

Type parameters are usually single capital le\ers.
For example,
public interface Map<K, V> {}
denes a generic Map interface with two type
parameters, K and V.
40
Generic Methods
Generic method = method with type parameter(s)
public class Utils
{
public static <E> void fill(ArrayList<E> a, E value, int count)
{
for (int i = 0; i < count; i++)
a.add(value);
}
}
A generic method in an ordinary (non-generic) class

Type parameters are inferred in call
ArrayList<String> ids = new ArrayList<String>();
Utils.fill(ids, "default", 10); // calls Utils.<String>fill
41
Autoboxing / Unboxing
Java 5 adds autoboxing and unboxing features.
Autoboxing: If an int is passed in a place where an
Integer is required, the compiler will insert a call to
the Integer constructor behind the scenes.
Auto-Unboxing: If an Integer is passed in a place
where an int is required, the compiler will insert a call
to the intValue method behind the scene.
42
1.5.2
The Diamond Operator

The diamond operator simplies the code when
type parameter is known
Since the type parameter is
known, we can use diamond
operator throughout.
43
1.5.3
Wildcards with Bounds

Wildcards are used to express subclasses (or
superclasses) of parameter types.
Now we can pass in

Shapes subclasses such
as Circle and Square.
44
1.5.4
Generic Sta+c Methods

Generic sta+c method is a generic method with
type parameter declared explictly.
Generic sta+c method forces type checking in
compile +me instead of run +me.
45
Remember: this is a
generic method
(instead of generic
class or interface)
1.5.5
Type Bounds
The type bound species proper+es that the
parameter types must have.
.
Type Bounds: AnyType IS-A Comparable<T>

where T is a superclass of AnyType
Compiler cannot prove that the

call to compareTo is valid.
46
1.5.6
2. Algorithm Analysis
Data Structures and Algorithms

Angus Yeung, Ph.D.

Book-keeping
WEEK #2
LECTURE #3
Add Codes
Assignment #1
Generic ImplementaFon
The Diamond Operator

Wildcards with Bounds
Generic StaFc Methods
Type Bounds
Target for this lecture: Finish the

basis of algorithmic analysis
Algorithmic Analysis
RelaFve Rates of Growth

Big-Oh Analysis
Upper and Lower Bounds
Typical Growth Rates
Common Order-of-Growth ClassicaFons
Course Structure
Founda'on
Reinforcement
AXend lectures
Read book chapters
WriXen class assignments

Integra'on

Study for exams
IntroducFon to CS146
Algorithm Alnalysis
Assignment 1
Assignment 2
Trees
Hashing
Assignment 3
Priority Queues
Mid-Term
SorFng
Assignment 4
Graph Algorithms
Final Exam
3
Purpose of this Chapter

An algorithm is a specied set of simple instrucFons to be followed to
solve a problem correctly.
Once an algorithm resolves a problem, we shall determine how much
resources, such as Fme and space, the algorithm will require.
As such, we want to discuss the following aspects in this chapter:
Time EsFmaFon
How to esFmate the Fme required for a program
ReducFon of Running Time
How to reduce the running Fme of a problem, for example, from days or
years to fracFons of a second.
Recursion
The results of careless use of recursion
Ecient Algorithms
Very ecient algorithms to raise a number to a power and to compute the
greatest common divisor of two numbers.
2.0
MathemaFcal Background
Throughout this course we will use the following four
deniFons to establish a relaFve order among funFons:

2.1
RelaFve Rates of Growth

Given two funcFons, there are usually points
where one funcFon is smaller than the other
funcFon, it doesnt make sense to claim, for
instance, f(N) < g(N)
We compare two funcFons using their relaFve
rates of growth
Example: Compare f(N): 1,000N with g(N): N2
If N < 1,000, f(N) > g(N)
If N > 1,000, f(N) < g(N)
2.1
DeniFon 2.1 and Big-Oh
DeniFon 2.1 Explained:
There is some point n0 past which cf(N) is always at least as large as T(N)
If the constant factors are ignored, f(N) is at least as big as T(N).
Lets illustrate this deniFon with our example:
Case 1: T(N)=1,000N, f(N)=N2, n0=1,000, and c=1.

Case 2: T(N)=1,000N, f(N)=N2, n0=10, and c=100.
Bad Style: f(N)O(g(N))

Wrong: f(N)O(g(N))
We use Big-Oh notaFon in our expression:
1,000N = O(N2) (order N-squared or Big-Oh N-squared)
In summary, DeniFon 2.1 says that:

Growth Rate of T(N) Growth Rate of f(N).
2.1
DeniFon 2.2 2.4

DeniFon 2.2 Explained: Growth Rate of T(N) Growth Rate of g(N)
DeniFon 2.3 Explained: Growth Rate of T(N) = Growth Rate of h(N)

DeniFon 2.4 Explained: Growth Rate of T(N) < Growth Rate of h(N)

2.1

T(N) = O(f(N)) means f(N) is an upper bound on T(N)
Example: N2 = O(N3)
Example: N2 = O(2N2)
T(N) = (f(N)) means T(N) is a lower bound on f(N)

Example: N3 = (N2)
Example: N2 = (2N2)
Hint: we can ignore the constant

in esFmaFng growth rates.
We want to make the result as Fght as possible

For example, if g(N) = 2N2, then g(N)=O(N4), g(N)=O(N3),
and g(N)=O(N2) are all correct.
g(N)=O(N2) gives the best result because it is the Fghest
possible descripFon of upper bound.
2.1

Upper and lower bounds valid for n > n0 smooth
out the behavior of complex funcFons

10

c*g(n) is an upper
bound for f(n)
O NotaFon
c*g(n) is a lower
bound for f(n)
c1*g(n) is an upper
bound for f(n) and
c2*g(n) is a lower
bound for f(n)

NotaFon
NotaFon
11
Big-Theta, Big Oh and Big Omega

Nota'on
Provides
Big Theta
AsymptoFc order (N2)

of growth

N2
10 N2
5 N2 + 22 N log N + 3N
Classify
algorithms
Big Oh
(N2) and smaller O(N2)
10 N2
100 N
22 N log N + 3 N
Develop Upper
Bounds
N2
N5
N3 + 22 N log N + 3 N
Develop Lower
Bounds
Big Omega (N2) and larger
Example Shorthand for
(N2)
Used to
12
Typical Growth Rates
13
2.1
Order-of-Growth ClassicaFons
Common order-of-growth classicaFons:
1, log N, N, N log N, N2, N3, and 2N
14
Order-of-growth ClassicaFons
Order of
Growth
Name
Typical Code
Descrip'on
Example
constant
a = b + c;
Statement
Add two numbers
log N
logarithmic
while (N > 1)
{ N = N / 2; ... }
Divide in half
Binary Search
linear
for (int i = 0; i < N; i++)

{... }
loop
Find the maximum
N log N
linearithmic
See mergesort
Divide and
conquer
Mergesort
N2
quadraFc
for (int i = 0; i < N; i++)

for (int j = 0; j < N; j++)
{... }
Double loop
Check all pairs
N3
cubic
for (int i = 0; i < N; i++)

for (int j = 0; j < N; j++)
for (int k = 0; k < N; k++)
{... }
Triple loop
Check all triples
2N
exponenFal
See combinatorial search
ExhausFve Search
Check all subsets
15
Merge Sort - Explained
16
Merge Sort - Demo
17
Useful Rules
Rule 1
If T1(N) = O(f(N)) and T2(N) = O(g(N)) , then
T1(N) + T2(N) = O( f(N) + g(N) )
T1(N) * T2(N) = O( f(N) * g(N) )
Rule 2
If T1(N) is a polynomial of degree , then

T1(N) = ( Nk )
Rule 3
If logkN = O(N) for any constant k, then

We know logarithms grow very slowly.

18
2.1

Review of Lecture #3
WEEK #2
LECTURE #4
Big-Oh Revisit
Book-keeping
Assignment #1
Algorithmic Analysis (Chapter 2)

Hpitals rule
Memory Usage
Sample Problem: Maximum Subsequence Sum Problem
Algorithm #1: O(N3)

Algorithm #2: O(N2)
Algorithm #3: O(NlogN)
Algorithm #4: O(N)
19
Some Tips for Big-Oh

Tip 1: It is very bad style to include constants or low-order terms inside a
Big-Oh.
Dont write this:
T(N) = O(2N2)
T(N) = O(N2+N)
Write this:
T(N) = O(N2)
Tip 2: For comparing the relaFve growth rates for two funcFons f(N) and
g(N), use Hpitals rule if necessary (most of Fme, this method is an
overkill)
g(N) grows faster
For example: limN->f(N)/g(N)

The limit is 0: This means that f(N)=o(g(N))
The limit is c0: This means that f(N)=(g(N))
The limit is : This means that g(N)=o(f(N))
The limit does not exist: There is no relaFon
f(N) and g(N) grows

at the same rate
f(N) grows faster
20
2.1
Downloading a File from Internet

Problem for downloading a le from Internet:
IniFal Delay: 3 seconds
Download Speed: 1.5 MB/s
File Size: N MB
Downloading Fme: T(N) = N/1.5 + 3

Analysis: T(1,500) 2 T(700) (a linear funcFon)
Big-Oh: T(N) = O(N)
Even though T(N) = (N) would be more precious,
Big-Oh answers are more typical.
21
2.1
How about Memory?
22
Memory Usage
23
One More Example
24
What to Analyze
Running Fme is the most important resource to analyze.
Several factors could aect the running Fme:
Computer
Compiler
Programming Language
Need to consider the implementaFon ineciency of a programming

language
Algorithm
Use the average-case performance, not the best-case performance
Input to the Algorithm
Use Tworst(N), not Tavg(N)
Unless otherwise specied, we use the worst-case Fme as

the quanFty for a factor
25
2.3
Max Subsequence Sum Problem

For example:
For input 2, 11, 4, 13, 5, 2, the answer is 20 (A2 through A4)
There are many algorithms to solve this problem.

Algorithm 4 is clearly the best choice for large amounts of input.
26
2.3

The gure shows the growth rates of the running
Fmes of the four algorithms
27
2.3

For larger values of N, the performance merit of
each algorithm becomes more evident.
28
2.3
A Simple Running Time Example

The cost for this simple program is 6N + 4 units.
We say that this method is O(N).
Count: 0
Count: 0
Count: 1
Init: 1, tesFng: N+1,

incremenFng: N+1
Count for * * + =: 4N
Count: 0
29
2.4
General Rules
Rule 1 for loops
The running Fme of the statements inside the for
loop the number of iteraFon = Total
O(N)
Example: 4 N = 4N
Rule 2 Nested Loops

The running Fme of the statement the product of
the sizes of all the loops = Total
O(N2)
Example:
30
2.4
General Rules
Rule 3 ConsecuFve Statements
Just add (the maximum is the one that counts)
Example:
O(N) lower order can be ignored
O(N2) higher order counts
Rule 4 if/else
Running Fme of the test plus the larger of the running

Fmes of statement inside of if or else loop
Example:
31
2.4
Algorithm 1
Cubic maximum conFguous subsequence sum
algorithm
3
2
1
O(N2) lower order can be ignored
32
2.4
Algorithm 2
EliminaFng Line 13 and 14 in Algorithm 1, we can
reduce the running Fme to O(N2).

SimplicaFon
33
2.4
Algorithm 3
We can use conquer and divide strategy and further simplify the
soluFon to O(NlogN).
The idea is to split the problem into two roughly equal subproblems and
solve them recursively.
Divide
Conquer
11
The maximum subsequence sum can

either occur enFrely in the le| half of
the input, or enFrely in the right half,
or cross the middle in both halves.
We use recursive calls to nd the
maximum subsequence sum in the
le| half and right half of the input.
The sums of both halves can be
added together to determine if the
maximum subsequence sum crosses
the middle.
34
2.4
Algorithm 3
We can use conquer and divide strategy and simplify the soluFon to O(NlogN).

Stop condiFon
Special case for odd
number of input entries
Recursive
calls
StarFng point: Calling funcFon for Algorithm 3.

35
2.4
Algorithm 3
Let T(N) as the Fme to solve a Max Subsequence sum problem of size N and T(1) as one unit.
T(1)=1, T(N) = 2T(N/2) + O(N)
ObservaFon: T(2)=2*2, T(4)=4*3, T(8)=8*4, T(16)=16*5
Conclusion: If N = 2k, then T(N)=N*(K+1)=NlogN+N=O(NlogN)

T(1) = 1 unit
T(N/2)
T(N/2)
O(N)
36
2.4
Algorithm 4
If we dont need to know the actual best subsequence, the
design of algorithm can be further simplied to O(N).

One loop only!

One pass through of the data
37
2.4
3. List, Stacks and Queues
Dr. Angus Yeung
CS146 Data Structures and Algorithms, Spring 2015,

Angus Yeung, Ph.D.
Course Structure
Founda'on
Reinforcement
AEend lectures
Read book chapters
WriEen class assignments

Integra'on

Study for exams
IntroducPon to CS146
Algorithm Alnalysis
Assignment 1
Assignment 2
Trees
Hashing
Assignment 3
Priority Queues
Mid-Term
SorPng
Assignment 4
Graph Algorithms
Final Exam

Lists, Stacks and Queues (Chapter 3)
Abstract Data Types (ADTs)
The List ADT
Simple Array ImplementaPon of Lists
Simple Linked Lists
Lists in the Java CollecPons API
Collection Interface
Iterators
The List Interface, ArrayList, and LinkedList
ListIterators
ImplementaPon of ArrayList
The Basic Class
The Iterator and Java Nested and Inner Classes
Java Review:
StaPc Class
Inner Class
Local Inner Class
Anonymous Inner Class
WEEK #3
LECTURE #6
Chapter 3 Overview
Chapter 3 discusses some most simple and basic
data structures
Introduce the concept of Abstract Data Type (ADTs)
Show how to eciently perform operaPons on Lists
Introduce the Stack ADT and its use in implemenPng
recursion
Introduce the Queue ADT and its use in operaPng
systems and algorithm design
3.0
Abstract Data Types

An abstract data type (ADT) is a set of objects
together with a set of operaPons.
MathemaPcal abstracPon
Objects:
lists, sets, graphs, etc.
OperaPons:
add, remove, contain, union, nd, etc.
ImplementaPon:
May have mulPple implementaPon
hidden away from users
AbstracPon: where interface is

separated from implementaPon
in order to hide the details of the
implementaPon
Similar to primiPve types in Java:int, double,

float, etc.
3.1
List ADT
The list ADT views its data much like an array
does: elements are accessible via consecuPve
indices.
Lists are dynamic:

can grow or shrink
length is not xed
There are never gaps between items in a list

The List ADT
The List ADT

List provides a exible interface that allows inserPng
and removing elements anywhere in the list.
When a list is implemented by using an array:
Pros:
printList is done in linear Pme.
findkth operaPon takes constant Pme
Cons:
Trade-o for this exibility: operaPons is O(N), instead of O(1) in
other data structures.
Worst case: InserPng into posiPon 0 requires shiding all the
elements in the list up one spot.
List: Shiding
5, 8, 2, 1, 4, 7
add(3,6)
5, 8, 2, 6, 1, 4, 7
RemoveAt(1) return 8
5, 2, 6, 1, 4, 7
Elements changed posiPons
Elements changed posiPons
List may not be

implemented with an
array, so no actual
shiding may occur.
Simple Array ImplementaPon

Although arrays are created with a xed capacity,
we can create a dierent array with double the
capacity when needed.
3.2.1
Simple Linked List

Elements are not stored conPguously
Avoid cost of inserPon and delePon.
Consists of a series of nodes, which are not

necessarily adjacent in memory
printList and findkth operaPons are no longer
as ecient as an array implementaPon
3.2.2
Linked List: InserPon & DelePon

Elements are not stored conPguously
Avoid cost of inserPon and delePon.

DelePon
InserPon
3.2.2
Removing the Last Node

Not so easy to remove the last node:
Search for the node with next link to the last node
Change the next link to null
Update the link to the last node

A doubly linked node

3.2.2
Java CollecPons API

Many data structures, e.g., List ADT, are
implemented in Javas CollecPons API
in package java.util.
Sample methods in Java Collection Interface:

size: returns the number of items in the collecPon
isEmpty: returns ture if and only if the size is
zero.
contains: returns true if x is in the collecPon
add/remove: add/remove item x to/from the
collecPon
3.3.1
Subset of CollecPon Interface

The Collection interface extends the Iterable interface .
Classes that implement the Iterable interface can provide a
way to view all their items.
Capitalized I
This i is not capitalized.
3.3.2
Iterators
CollecPons that implement the Iterable interface must provide a method
named iterator
The method iterator returns an object type Iterator.
The Iterator is an interface dened in package java.uPl and is shown below:
3.3.2
Print All the Items

Each call to next gives the next item in the collecPon and hasNext can be
used to tell if there is a next item.
When the compiler sees an enhanced for loop being used on an object that is
Iterable, it mechanically obtains an Iterator and then calls to next and
hasNext.
This is an enhanced for loop.

3.3.2
remove using Iterator

Both the Collection and Iterator interfaces contain a
method called remove.
Collections remove method must rst nd the item to
remove.
Iterators remove method removes the last item returned
by next.
1
This is more ecient in some cases, e.g., removing every other item
in the collecPon
Be careful with a ConcurrentModificationException

when a structural change (e.g., call using Collections
remove) to the collecPon being iterated.
Thats another reason to prefer the iterators remove method, so
there is no a concurrency problem.
2
3.3.2
The List Interface

The List interface extends Collection, so it contains
All the methods in the Collection interface, plus
A few others (shown in below)
Access or change item
at index; Index goes
from 0 to size()-1
Overloaded to remove an
item at a specied posiPon
3.3.3
ArrayList and LinkedList

The ArrayList provides a growable array
implementaPon:
Advantage: Calls to get and set take constant Pme
Disadvantage: If the changes are not made at the end, Inser'on
and removal are expensive
The LinkedList provides a doubly linked list

implementaPon of the List ADT.
Advantage: If the posiPon of the changes is known, inser'on of
new items and removal of exisPng items is cheap
Disadvantage: Since LinkedList is not easily indexable, calls
to get are expensive (unless they are close to the end)
3.3.3
Making a List
Whether an ArrayList or LinkedList is passed as a
parameter, the running Pme of the following method is O(N)
because each call to add (to the end of the list), takes constant Pme.
If we are adding to the front, then

ArrayList takes O(N2)
LinkedList is O(N) operaPon
Here we are adding a new item

to the front of the list
3.3.3
Sum of the Numbers in a List

Example: compute the sum of the numbers in a List
Running Pme for an ArrayList: O(N)
Running Pme for a LinkedList: O(N2) because calls to
get are O(N) operaPons.
Running Pme using an enhanced for loop and Lists
iterator: O(N)
Similarly, we want to use

contains and remove in
Collection because
ArrayList and LinkedList
are inecient for searches.
3.3.3

Lists in the Java CollecPons API (cont)
ListIterators
The Basic Class
The Iterator and Java Nested and Inner Classes
Java Review:
StaPc Class
Inner Class
Local Inner Class
Anonymous Inner Class
ImplementaPon of LinkedList
WEEK #4
LECTURE #7
Example: remove Even Numbers

Example: remove all even-valued items in a List
Before: 6, 5, 1, 4, 2
Ader: 5, 1
Using an ArrayList is a losing strategy!
The remove is not ecient, so the rouPne takes quadraPc Pme.
Using a LinkedList has problems as well.
The call to get is not ecient, taking quadraPc Pme.

The call to remove is equally inecient, because it is expensive to get to posiPon i.
3.3.4
Improvement: remove Even Numbers

Improvement 1: Use iterator instead of get.
Problem: ConcurrentModificationException with
Collections remove
Improvement 2: Use iterators remove to avoid the

ConcurrentModificationException problem.
Before:
ArrayList -> O(N2)
LinkedList -> O(N2)
Ader:
ArrayList -> O(N2)
because array items must be shi3ed
LinkedList -> O(N)
3.3.4
Running Times for Improvement #2

Running Pmes for our code in Improvement #2
LinkedList: linear growth rate O(N)
ArrayList: quadraPc growth rate O(N2)
List Type
No. of Items
Running Time (s)
LinkedList<Integer>
800,000
0.039
LinkedList<Integer>
1,600,000
0.073
ArrayList<Integer>
800,000
300
ArrayList<Integer>
1,600,000
1,200
3.3.4
ListIterators
A ListIterator extends the funcPonality of an Iterator for Lists:
Previous and hasPrevious allow traversal of the list from the back to the
front
Add places a new item into the list in the current posiPon
Normal starPng point: next

returns 5, previous is illegal,
add places item before 5
next returns 8, previous

returns 5, add places item
between 5 and 8
next is illegal, previous

returns 9, add places item
ader 9
3.3.5
Outlines of a usable ArrayList generic class:
MyArrayList
Maintain the underlying array, the array capacity, and the

current number of items stored
Provide a mechanism to change the capacity of the
underlying array
Provide an implementaPon of get and set
Provide basic rounPnes, such as size, isEmpty, and
clear, as well as remove, and two versions of add
rouPnes
Provide a class that implements the Iterator interface.
E.g., next, hasNext, and remove.
3.4
MyArrayList: The Basic Class
3.4.1
MyArrayList: The Basic Class
3.4.1
Inner Class - 1
This iterator version doesnt work because theItems and
size() are not part of the ArrayListIterator class.
Problem with scoping
3.4.2
Inner Class - 2
The iterator is a top-level class and stores the current posiPon and a link to the
MyArrayList. It doesnt work because theItems is private in the
MyArrayList class
It is dened as a private.
It is a HAS-A relaPonship.
Error Here!
3.4.2
Inner Class - 3
This Pme it works: The iterator is a nested class and stores the current
posiPon and a link to the MyArrayList. It works because the nested class is
considered part of the MyArrayList class.
ArrayListIterator is dened
inside of MyArrayList.
Static indicates
a nested class
3.4.2
Inner Class - 4
This one works as well: The iterator is an inner class and stores
the current posiPon and an implicit link to the MyArrayList.
Inner class doesnt have
the static keyword
We are using the implicit link to

the MyArrayList here.
3.4.2
Nested Class in Java

StaPc Class: declared as a staPc member of
another class
Inner Class: declared as an instance member of
another class
Local Inner Class: declared inside an instance
method of another class
Anoymous Inner Class: like a local inner class, but
wriEen as an expression which returns a one-o
object
StaPc Classes
The nested class has access to its containing
classs private staPc members (is it useful at all?)
package pizza;
public class Rhino {

...
public static class Goat {

...
}
Inner Classes
An inner class is a class declared as a non-staPc member of another class
The inner class instance has access to the instance members of the
containing class instance.
These enclosing instance members are referred to inside the inner class via
just their simple names, not via this (this in the inner class refers to the
inner class instance, not the associated containing class instance).

package pizza;
public class Rhino {

public class Goat {
...
}
private void jerry() {

Goat g = new Goat();
}
Local Inner Classes

A local inner class is a class declared in the body of a
method.
Such a class is only known within its containing
method, so it can only be instanPated and have its
members accessed within its containing method.
Because a local inner class is neither the member of a
class or package, it is not declared with an access
level.
Access to the containing classs instance members is
like in an instance inner class.
Anonymous Inner Classes

A local inner class is instanPated at most just once
each Pme its containing method is run.
Use like this:
new *ParentClassName*(*constructorArgs*) {*members*}
Cannot supply your own constructor

Setup using an iniPalizer block: a {} block placed
outside any method
ImplementaPon of LinkedList
MyLinkedList : contains links to both ends, the size of
the list, and a host of methods.
Node : contains data and links to the previous and next
nodes, along with appropriate constructors.
LinkedListIterator : implements Iterator with
next, hasNext and remove methods.
3.5
Adding a Node
3.5
Removing a Node
3.5

Book Keeping
SoluPons for Assignment #1
Assignment #2

The Stack ADT
Stack Model
ImplementaPon of Stacks
ApplicaPons
The Queue ADT

Queue Model
Array ImplementaPon of Queues
Short Review QuesPons
WEEK #4
LECTURE #8
The Stack ADT

A stack is a list with the restricPon that inserPons
and delePons can be performed at the end of the
list, called the top.
Stack model:
Only the top element is accessible.
Push and pop operaPons are based on

LIFO (last in, rst out)
3.6.1
ImplementaPon of Stacks
Linked List ImplementaPon of Stacks
Push: insert at the front of the list
Top/Pop: return the value of the element at the front
of the list and delete it.
Array ImplementaPon of Stacks

Push: increment topOfStack and set the element at
topOfStack.
Top/Pop: return the value of the element at
topOfStack and decrement topOfStack.
3.6.2
ApplicaPon: Balancing Symbols

Compilers need to check if all symbols are balanced, e.g.,
[()] is legal, but [(] is wrong.
One can use stack to balance symbols:
Make an empty stack.

Read characters unPl end of le.
If the character is an opening symbol, push it onto the stack.
If it is a closing symbol, then if the stack is empty report an
error;
Otherwise, pop the stack.
If the symbol popped is not the corresponding opening symbol,
then report an error.
At end of le, if the stack is not empty report an error.
3.6.3
ApplicaPon: Poswix Expressions

Poswix or reverse Polish notaPon express an expression:
4.99 * 1.06 + 5.99 + 6.99 * 1.06 =, into
4.99 1.06 * 5.99 + 6.99 1.06 * +
The easiest way to do poswix is to use a stack.

For example, 6 5 2 3 + 8 * + 3 + *
A + is read, so 3 and 2
are poped from the stack.
The sum 5 is pushed.
Next 8 is pushed
With a *, 8 and 5 are

poped and 40 is pushed.
Next a + is seen, so 40
and 5 are popped and 5 +
40 = 45 is pushed.
Now 3 is pushed.
With a +, 3 and 45 are

popped and 48 is
pushed.
Finally, a * is seen and

48 and 6 are popped. The
result 6 * 48 = 288 is
pushed.
3.6.3
Inx to Poswix Conversion

Stack can be used to convert an expression in standard form (known as
inx) into poswix.
Given the inx expression:
a + b * c + (d * e + f) * g ,
the poswix is:
a b c * + d e * f + g * +
The stack represents pending operators. Some of the operators on the

stack that have high precedence are known to be completed and should
be popped.
3.6.3
Inx to Poswix Algorithm

Inx to Poswix conversion using stack:
a + b * c + (d * e + f) * g

1. Symbol a -> output, Operator + -> Stack,

Symbol b -> output.
4. Operator ( -> Stack, Symbol d -> output.
5. Operator * -> Stack, there is no output

2. Operator * -> Stack, the top entry of operator
stack has a lower precedence -> nothing is output. because of (, Symbol e -> output.
3. Operator + -> stack, pop * and +.
6. Operator + -> stack, pop and output *.
7. Operator ) -> stack, output +, empty ().
8. Operator * -> stack, Symbol g -> output.
9. Input is now empty, pop and output

all symbols from the stack.
3.6.3
ApplicaPon: Method Calls

When a call is made to a new method
all the variables local to the calling rouPne need to be

saved by the system.
The current locaPon in the rouPne must be saved so the
new method knows where to go ader it is done.
This can be implemented with the balancing symbols

algorithm using a stack.
The informaPon saved in the stack is called acPvaPon
record or stack frame:
Register values are saved
Return address is saved at the top
3.6.3
Running Out of Stack Space

Remember we learned that recursive funcPon is a bad idea,
because:
Overhead associated with stack frame
Problem with stack overow
Tail recursion problem.

Can use a while loop
because nothing nneds
to be saved
3.6.3
Using a while Loop

Tail recursion problem can be resolved using a
while loop instead of recursive funcPon call.
SomePmes compiler might

automaPcally detect tail
recursion and use a scheme
similar to the while loop
implementaPon.
3.6.3
The Queue ADT

Queue Model:
Array ImplementaPon of Queues
Enqueue: set the back element, currentSize++

Dequeue: return the element at front, currentSize--
3.7
Circular Array ImplementaPon
3.7.2
Inheritance & Generic Type

Given:

Which statement inserted independently at line 9 will compile? (Choose all
that apply.)
A. return new ArrayList<Inn>();
B. return new ArrayList<Hotel>();
C. return new ArrayList<Object>();
D. return new ArrayList<Business>();
Anonymous Subclass
Given:

What is the result?
A. An excepPon occurs at runPme
B. true
C. Fred
D. CompilaPon fails because of an error on line 3
E. CompilaPon fails because of an error on line 4
F. CompilaPon fails because of an error on line 8
G. CompilaPon fails because of an error on a line other than 3, 4, or 8
Inner Class
Given:

Which, inserted independently at line 6, compile and produce the output "spooky"?
(Choose all that apply.)
A. Sanctum s = c.new Sanctum();
B. c.Sanctum s = c.new Sanctum();
C. c.Sanctum s = Cathedral.new Sanctum();
D. Cathedral.Sanctum s = c.new Sanctum();
E. Cathedral.Sanctum s = Cathedral.new Sanctum();
Local Inner Class

Given:

What is the result?
A. inner
B. outer
C. middle
D. CompilaPon fails
E. An excepPon is thrown at runPme
4. Trees
Dr. Angus Yeung
CS146 Data Structures and Algorithms, Spring 2015,

Angus Yeung, Ph.D.
Course Structure
Founda'on
Reinforcement
ABend lectures
Read book chapters
WriBen class assignments

Integra'on

Study for exams
IntroducOon to CS146
Algorithm Alnalysis
Assignment 1
Assignment 2
Trees
Hashing
Assignment 3
Priority Queues
Mid-Term
SorOng
Assignment 4
Graph Algorithms
Final Exam

Book-Keeping
Assignment #2
Quiz #2
Trees (Chapter 4)
Preliminaries
ImplementaOon of Trees
Tree Traversals with an ApplicaOon
Binary Trees
ImplementaOon
An Example: Expression Trees
The Search Tree ADT Binary Search Trees
contains
findMin and findMax
insert
remove
Average-Case Analysis
WEEK #5
LECTURE #9
Chapter 4 Overview
Trees in general are very useful abstracOons in
computer science. In Chapter 4, we will
See how trees are used to implement the le system
of several popular operaOng systems.
See how trees can be used to evaluate arithmeOc
expressions.
Show how to use trees to support searching
operaOons in O(logN) average Ome, and how to rene
these ideas to obtain O(logN) worst-case bounds.
Discuss and use the TreeSet and TreeMap classes.
4.0
Tree Preliminaries
parent
edge
Grandparent
child
Every node except the

root has one parent
Depth = 2
Grandchild
siblings
Height (the longest path) = 3
4.1
ImplementaOon of Trees
class TreeNode
{
Object element;
TreeNode firstChild;
TreeNode nextSibling;
4.1.1
ApplicaOon: Unix File System
Advantages:
Allow users to organize their data logically.
Two les in dierent directories can share the same
name.
4.1.2
Pre-order Traversal
Preorder Traversal: work at a node is performed

before its children are processed.
Line 1: print name (once per node);
Line 2: test directory (once per node);
Line 4: recursive call (once for each child);
The total amount of work is constand per node: O(N)

4.1.2
Post-order Traversal
Postorder Traversal: work at a node is

performed aier its children are evaluated.
If the object is not a directory, the size returns

the number of blocks it uses;
Otherwise, the number of blocks used by the
directory is added to the number of blocks
(recursively) found in all the children.
4.1.2
Binary Trees
Binary Tree
No node can have more than two children;

The depth of an average binary tree is considerably smaller
than N.
Average Depth: O ( N )
Depth for Binary Search Tree: O(log N)
Worst-case: O(N)
Worst-case
4.2
ImlementaOon of Binary Trees

ImplementaOon of Binary Tree
A node consists of the element informaOon plus two
references (left and right) to the other nodes
We can keep direct links to children because a binary
tree node has at most two children
4.2.1
Example: Expression Trees

An Example of binary trees: Expression Trees
Leaves: operands, e.g., constants or variable names

Nodes: operators
(a + b * c) + ((d * e + f) * g
General Strategy:
Inorder traversal (lei, node, right)

(a + (b * c)) + (((d * e) + f) * g)
Postorder traveral (lei subtree, right subtree, operator)
a b c * + d e * f + g * +
4.2.2
ConstrucOng an Expression Tree

Input (Posjix): a b + c d e + * *
ConstrucOng an expression tree using stack:
1. Push operands a
and b onto a stack
2. Read +, pop two trees

and form a new tree.
3. Read c, d, and e. create a

one node tree for each.
4. Read +, pop two trees

and form a new tree
5. Read *, pop two trees

and form a new tree.
6. Read *, pop two trees

and form the nal tree.
4.2.2
The Search Tree ADT

An important applicaOon of binary trees is their use in searching.
Binary Search Tree: For every node, X, in the tree:
the values of all the items in its lei subtree are smaller than the item in X,
the values of all the items in its right subtree are larger than the item in X.
The average depth of a binary search tree is O(log N).
This is NOT a
binary search tree
This is a binar
search tree
4.3
Contains, findMin and FindMax

4.3.1 contains
Check node rst, then make recursive call on a subtree
of T
It can be either lei or right subtree depending on the
relaOonship of X to the item stored in T.
4.3.2 findMin and findMax

findMin: start at the root and go lei as long as there
is a lei child;
findMax: start at the root and go right as long as
there is a right child;
4.3
insert
To insert X into tree T, proceed down the tree as you would
with a contains.
If X is found, do nothing (or update something).
Otherwise, insert X at the last sport on the path traversed.
Example: Adding
5 to the tree.
4.3.3
remove
To remove node X from tree T
If X is a leaf, it can be deleted immediately.

If X has one child, the node can be deleted aier its parent adjusts a link to
bypass the node
If X has two children, replace the data of this node with the smallest data of
the right subtree and recursively delte that node.
DeleOon of Node (4) with one child
DeleOon of Node (2) with two children
4.3.4
Average-Case Analysis
Binary search tree should take O(log N) Ome because
in constant Ome we descend a leve in the tree, thus
operaOng on a tree that is now roughly half as large.
Internal Path Length, D(N): sum of the depths of all
nodes in a tree (N-node tree)
D(N) = D(i) + D(N-i-1) + N 1
If all subtree sizes are equally likely, we can put the
average value of subtree into D(N):
4.3.5
Example: 500-node tree

Average of D(N) = O(N log N)
The expected depth of any node is O(log N)
500-node tree has nodes at expected depth 9.98.
4.3.5
Unbalanced Tree
Aier a quarter-million random insert/remove
pairs, the tree looks decidedly unbalanced (average
depth equals 12.51).
4.3.5

Trees (Chapter 4)
The Search Tree ADT Binary Search Trees
Review of source code
AVL Trees
Single RotaOon
Double RotaOon
Review of source code
Splay Trees
Splaying
B-Trees
WEEK #5
LECTURE #10
AVL Trees
AVL (Adelson-Velskii and Landis) tree: a binary search tree with
a balance condiOon.
Require the lei and right subtree to have the same height
IdenOcal to a binary search tree, except that for every node, the
height of the lei and right subtrees can dier by at most 1.
Height of AVL: 1.44 log (N+2) 1.328
AVL
Unbalanced
Unbalanced
4.4
Smallest AVL Tree of Height 9

Example: Smallest AVL Tree of Height 9
Fewest node (143)

Lei Subtree: height 7 of minimum number of nodes
Right Subtree: height 8 of minimum size
4.4
Rebalancing for InserOon

Rebalancing the node
Height imbalancing: s two subtrees height dier by two
InserOon on the outside: use single rotaOon for balancing
Case 1: An inserOon into the lei subtree of the lei child of .
Case 4: An inserOon into the right subtree of the right child of .
InserOon on the inside: use double rotaOon for balancing

Case 2: An inserOon into the right subtree of the lei child of .
Case 3: An inserOon into the lei subtree of the right child of .
4.4
Single RotaOon: Fixing Case 1

ViolaOon of the AVL balance property
Aier inserOon in Case 1, the lei subtree for Node k2 is two levels depper
than its right subtree
Rebalancing
Move X up a level and Z down a level

Grab k1 and shake it, lesng gravity take hold
Subtree Y now becomes the lei child of k2
4.4.1
Single RotaOon: Example

Unbalanced tree aier inserOon of Node 6
Balanced tree aier single rotaOon between 7 and 8
unbalanced
Child root
becomes the
new root
4.4.1
Single RotaOon: Fixing Case 4

Case 4 represents a symmetric case as Case 1
4.4.1
Single RotaOon: Example

Start with an empty AVL tree, and insert the items 3,
2, 1, and then 4 through 7 in sequenOal order.
InserOng 3, 2, 1
InserOng 6
InserOng 4, 5
InserOng 7
4.4.1
Double RotaOon
Single RotaOon fails to x cases 2 or 3.
4.4.2
Double RotaOon
Lei-right double rotaOon to x case 2
hBp://www.cs.uah.edu/~rcoleman/CS221/Trees/AVLTree.html
4.4.2
Double RotaOon
Right-lei double rotaOon to x case 3
hBp://www.cs.uah.edu/~rcoleman/CS221/Trees/AVLTree.html
4.4.2
Double RotaOon: Example

From Single RotaOon example, insert 10 thru 16 in
reverse order, followed by 8 and then 9.
InserOng 16, 15: right-lei double rotaOon
4.4.2

InserOng 14: right-lei double rotaOon
4.4.2

InserOng 13: single rotaOon
4.4.2

InserOng 12: single rotaOon
4.4.2

InserOng 11, 10: single rotaOon
4.4.2

InserOng 8: no rotaOon
InserOng 9: double rotaOon
4.4.2

Trees (Chapter 4)
Quize 2
Book-Keeping
WEEK #6
LECTURE #11
Assignment #2 is extended to March 4

Visual Go
Splay Trees
B-Trees
Sets and Maps
Review of AVL source code (if Ome allows)
Splay Trees
Basic ideas for Splay Tree:
Not a new type of tree, but a re-implementaOon of the

Binary Search Tree insert, delete, and search methods
The goal is to improve their performance
No single operaOon on a splay tree is guaranteed to

have beBer performance
But a series of m operaOons will take O(M log N) Ome for a

tree of N nodes, whenever M > N
Not highly balanced like an AVL tree
Lowering the cost of an enOre series of operaOons is more

improtant than keeping the tree balanced
4.5
Splay Trees
Whenever a splay tree node is accessed, the tree
performs splaying operaOons that move the accessed
node to the root of the tree.
Splaying a node consists of a series of rotaOons.
Similar to AVL tree rotaOons.
The goal is to move the accessed node to the root.

A side benet is to make the tree more balanced.
The theory is that once a node has been accessed, it

will soon be accessed again.
Future accesses are fast if the node is the root.
4.5
Splay Trees
If a node has not been accessed in a while, you
will pay the performance penalty of splaying in
the next Ome it is accessed.
But access of that node in near furture is very fast.
So we amorOze the cost of splaying over future
operaOons.
4.5
Zig-zag
Strategy: rotate boBom up along the access path.
X is right child, P is the lei child: perform double rotaOon
Both X and P are lei children: transform the tree as below:
4.5
Example
Example, with a contains on k1.
K1 is a zig-zag, so perform double rotaOon using k1, k2, k3:
4.5
Example
K1 is a zig-zag, we do rotaOon with k1, k4 and k5.
4.5
Splay Trees
How is a worst-case BST created ?
When all the nodes are entered in sorted order.
Suppose the boBom node is accessed in such a tree:
4.5
Splay Trees
Splaying at node 1
4.5
Splay Trees
Splaying at node 2
4.5
Splay Trees
Splaying at node 3
4.5
Splay Trees
Splaying at node 4
4.5
Splay Trees
Splaying at node 5
4.5
Splay Trees
Splaying at node 6
4.5
Splay Trees
Splaying at node 7
4.5
Splay Trees
Splaying at node 8
4.5
Splay Trees
Splaying at node 9
4.5
B-trees
A B-tree is a tree data structure suitable for disk
drives
It may take up to 11ms to access data on disk
Todays modern CPUs can execute billions of
instructuions per second
Therefore, it makes sense for us to spend CPU cycles
to reduce the number of disk accesses.
B-trees are oien used to implement databases.

4.7
B-trees
A B-tree is an m-ary (allowing for M-way branching) tree.
5-ary tree of 31 nodes has only three levels.
4.7
B-trees
B-tree of order 5
4.7
B-trees
B-tree aier inserOon of 57 into the tree
4.7
B-trees
InserOon of 55 into the B-tree causes a split into two leaves
4.7
B-trees
InserOon of 40 causes a split into two leaves and then a split of
the parent node.
4.7
B-trees
B-tree aier the deleOon of 99 from the B-tree
4.7
Sets
Set interface is inherited from Collection
Does not allow duplicates
OperaOons: insert, remove, etc.
Very ecient basic search
SortedSet interface
all items are in sorted order
TreeSet implements SortedSet interface
Basic operaOons take logarithmic worst-case Ome

Implement the Comparable (or Comparator) interface
4.8.1
Maps
Map interface is inherited from Collection
Consists of keys and their values

Keys must be unique, but several keys can map to the same values
Basic operaOons: isEmpty, clear, size, containsKey, get,
put.
SortedMap interface
all items are in logically sorted order
IteraOng through a Map could be tricky because there is no

iterator. Instead these methods are used:
We can return a Set

here since key is unique.
Returning a Set of entries

4.8.2
Summary
Tree
Descrip'on
Applica'on
Binary Trees At most two child nodes
Easy for implementaOon
Binary
Lei-subtree is less than X; Right subtree is
Search Trees larger than X
Ecient search: node visit =

height of the tree
AVL Trees
The dierence between lei-subtree and right Ecient search: always

subtrees not more than one level. Always
maintain minimum height of
rebalancing aier inserOon or removal (single/ tree
double rotaOons)
Splay Trees
Once a node is accessed, it should be made

very accessible for future. This is done by
moving the accessed node to the root.
Repeated access of accessed

node is very ecient.
B-Trees
ImplementaOon of balanced M-ary search

tree, allowing M-way branching.
Ecient search for data on

disk drives.
5. Hashing
Dr. Angus Yeung
Course Structure
Founda'on
Reinforcement
A6end lectures
Read book chapters
Wri6en class assignments

Integra'on

Study for exams
IntroducJon to CS146
Algorithm Alnalysis
Assignment 1
Assignment 2
Trees
Hashing
Assignment 3
Priority Queues
Mid-Term
SorJng
Assignment 4
Graph Algorithms
Final Exam

Book-Keeping
Review of Quiz #2
Hashing (Chapter 5)
General Idea
Hash FuncJon
Separate Chaining
Hash Tables without Linked Lists
Rehashing
WEEK #6
LECTURE #12
Why Hash Tables?

Hash tables are good for doing a quick search on things.
For instance if we have an array full of data (say 100 items). If we knew
the posiJon that a specic item is stored in an array, then we could
quickly access it.
For instance, we just happen to know that the item we want it is at posiJon
3; we can apply: myitem=myarray[3];
With this, we don't have to search through each element in the array, we
just access posiJon 3.
The quesJon is, how do we know that posiJon 3 stores the data that we are
interested in?
This is where hashing comes in handy. Given some key, we can apply a
hash funcJon to it to nd an index or posiJon that we want to access.
Chapter Overview
Hashing:
ImplementaJon of hash tables
Technique for performing inserJons, deleJons, and
searches in constant average Jme
We will cover the following in this chapter:

See several methods of implemenJng the hash table
Compare these methods analyJcally
Show numerous applicaJons of hashing
Compare hash tables with binary search trees
5.0
General Idea
Hash table data structure: an array of
some xed size, containing the items
Each item could consist of a key and
addiJonal data elds
Hash funcJon:
the mapping to convert each key into some
numbers in the range from 0 to TableSize
-1 and placed in the appropriate cell.
Distributes the keys evenly among the cells.
Collision:
when two keys hash to the same value.
5.1
Hash FuncJon
Some simple hash funcJons
Keys are in integers: returning Key mod TableSize
Keys are in strings: adding up ASCII values of the
characters in the string
(What if TableSize=10,007
and typical hash funcJon is 127
* 8 = 1,016? It will not be a
good and equitable distribuJon.)
5.2
Hash FuncJon
Some simple hash funcJons
Keys has at least three characters:
Problem: English is not random. 263 = 17,576 possible

combinaJons actually reduces to only 2,851 combinaJons.
5.2
Hash FuncJon
A good hash funcJon:
Mapping using:
Handling the possible overow problem

5.2
Separate Chaining
One collision resoluJon strategy is Separate Chaining.

Separate Chaining: to keep a list of all elements that hash to the same value.
Example: hash(x) = x mod 10
The load factor of a hash table is dened as the raJo of the number of
elements in the hash table to the table size.
5.3
Linear Probing
Besides separate chaining, another strategy is called Open Addressing.

Probing hash tables: it doesnt use separate chaining hashing.
Linear Probing: when a collision happens, it will try cells sequenJally (and weaparound) in search of an
empty cell.
Primary clustering: if the table is relaJvely empty, blocks of occupied cells start forming.
5.4.1
Linear Probing
Number of probes plo6ed against load factor for linear probing (dashed) and random
strategy (S is successful search, U is unsuccessful search, and I is inserJon)
5.4.1
QuadraJc Probing
QuadraJc probing is a collision resoluJon method that eliminates the
primary clustering problem of linear probing: f(i) = I2
When collison occurs, the next posiJon a6empted is one cell away.
5.4.2
Double Hashing
Secondary clustering: quadraJc probing eliminates primary clustering,
elements that hash to the same posiJon will probe the same alternaJve
cells.
Double hashing eliminates secondary clustering: f(i) = i.hash2(x) by
applying a second hash funcJon to x and probing at a distance hash2(x),
2hash2(x), and so on.
5.4.3
Rehashing
Problems when the table gets too full:
Running Jme for the operaJons will take too long
InserJons might fail for open addressing hashing with
quadraJc resoluJon, especially if there are too many
removals intermixed with inserJons.
SoluJon:
Build another table that is about twice as big (with an
associated new hash funcJon)
Scan down the enJre orginial hash table and compute the
new hash value for each (non-deleted) element
Insert the new hash values in the new table.
5.5
Rehashing
Hash funcJon h(x) = x mod 7
Aner inserJng 23, the hash table is over 70% in capacity.
5.5
Rehashing
Linear probing hash table aner rehashing.
New Hash funcJon: h(x) = x mod 17
5.5

Book-Keeping
Hashing (Chapter 5)
WEEK #7
LECTURE #13
Hash Tables in the Standard Library

Hash Tables with Worst-Case O(1) Access
Perfect Hashing
Cuckoo Hashing
Hopscotch Hashing
Extendible Hashing
ImplementaJon of Hash Tables

Java includes hash table implementaJons of Set and Map: HashSet
and HashMap
Must provide an equals and hashCode method
Implemented using separate chaining hashing
Use them when we dont care about sorted order.
For the word-changing example:
A map in which the key is a word length, and the value is a collecJon of all words
of that word length. -> HashMap
A map in which the key is a representaJve, and the value is a collecJon of all
words with that representaJve. -> HashMap
A map in which the key is a word, and the value is a collecJon of all words that
dier in only one character from that word. -> HashMap
The performance of a HashMap can onen be superior to a TreeMap.

The best strategy is to use the interface type Map, and then change the
instanJaJon from a TreeMap to a HashMap, and perform Jming tests.
5.6
ImplemenJng hashCode
If a class overrides equals, it must override hashCode
When they are both overridden, equals and hashCode must
use the same set of elds
If two objects are equal, then their hashCode values must be
equal as well
If the object is immutable, then hashCode is a candidate for
caching and lazy iniJalizaJon
It's a popular misconcepJon that hashCode provides a unique
idenJer for an object. It does not.
Caching the Hash Code

A classic Jme-space tradeo is Caching the Hash Code:
Each String object stores internally the value of its hashcode
Why? It is because compuJng the hashCode is expensive
Works only because String are immutable.
If the String were allowed to change, it would invalidate the hashCode and the hashCode
would have to be reset back to 0.
Excerpt of String class hashCode
5.6
Hash Tables w/ Worst-Case O(1) Access

We want to obtain O(1) worst-case cost. Why?
In applicaJons such as hardware implementaJons of lookup tables (LUT) for
routers and memory caches, it is important that the search have a denite
(constant) amount of compleJon Jme.
If N is known, and we are allowed to rearrange

items as they are inserted, then O(1) worst-case
cost is achievable for searches.
There are dierent soluJons to this problem:
Perfect Hashing
Cuckoo Hashing
Hopscotch Hashing
5.7
Perfect Hashing
If we have N items, how do we lower the probability
of collisions?
One Approach:
We can have separate chaining implementaJon, and
Keep each list at most a constant number of items.
As we make more lists, the lists will on average be shorter.
Problems with this separate chaining approach:

Even with lots of lists, we might sJll get unlucky
The number of lists might be unreasonably large
5.7.1
Perfect Hashing
Even with lots of lists, we might sJll get unlucky
Choose M (number of lists) to be suciently large
that probability is at least for no collisions.
If a collision is detected, we simply clear out the
table and try again using a dierent hash funcJon
that is independent of the rst.
Keep trying unJl we get no collisions.
The expected number of trials will be at most 2
(since the success probability is )
5.7.1
Perfect Hashing
The number of lists might be unreasonably large
How large M needs to be?
M needs to be quite large: M=(N2)
If M = N2, table is collision free with probability at
least .
Theorem 5.2
If N balls are placed into M=N2 bins, the probability

that no bin has more than one ball is less than .
(See textbook for the proof)
5.7.1
Perfect Hashing
Using N2 lists is impracJcal.
More pracJcal implementaJon:
Use only N bins, but resolve the collisions in each bin
by using hash tables instead of linked lists.
The bins are expected to have only a few items each,
the hash table for each bin can be quadraJc in the bin
size.
5.7.1
Perfect Hashing
Perfect hashing table using secondary hash tables
5.7.1
Perfect Hashing
The scheme of Perfect Hashing:
The primary hash table can be constructed several Jmes if

the number of collisions that are produced is higher than
required.
Each secondary hash table will be constructed using a
dierent hash funcJon unJl is is collision free.
Theorem 5.3
If N items are placed into a primary hash table containing

N bins, then the total size of the secondary hash tables has
expected value at most 2N.
Perfect hashing works if the items are all known in

advance.

5.7.1
Cuckoo Hashing
If N items are randomly tossed into N bins, the
size of the largest bin is expected to be (log N/
log log N).
If, at each toss, two bins were randomly chosen
and the item was tossed into the more empty bin
(at the Jme), then the size of the largest bin
would only be (log log N), a signicantly lower
number.
Thats so called the power of two choices.
5.7.2
Cuckoo Hashing
Given N items, we maintain two tables:
each more than half empty and
each with independent funcJon for assigning each
item to a posiJon in each table,
Cuckoo hashing maintains the invariant that an

item is always stored in one of these two
locaJons.
5.7.2
Cuckoo Hashing
Item A can be at either posiJon 0 in Table 1, or
posiJon 2 in Table 2.
A search in a cuckoo hash table requires at most two
table accesses
5.7.2
Cuckoo Hashing
The Cuckoo hashing algorithm: To insert a new item
x, rst make sure it is not already there.
If the rst table locaJon is empty, the item can be
placed.
5.7.2
Cuckoo Hashing
The Cuckoo hashing algorithm: To insert a new item
x, rst make sure it is not already there.
If the rst table locaJon is empty, the item can be
placed.
5.7.2
Cuckoo Hashing
To insert B, we can add it to locaJons 0 in Table 1 and 0 in Table
2.
Table 1 is already occupied by A in posiJon 0.
Cuckoo will preempJvely displace A and does not bother to look
at Table 2.
5.7.2
Cuckoo Hashing
InserJon of C is straighyorward.
For InserJon of D with hash locaJons (1,0), Table 1
locaJon is already taken but we dont look at the
Table 2 locaJon.
5.7.2
Cuckoo Hashing
E can be easily inserted.
In order to insert F, we need to displace E, then A, and then B.
5.7.2
Cuckoo Hashing
But we cannot successfully insert G!
G Hash locaJons (1, 2)
Displace D,
Displace B,
Displace A,
Displace E,
Displace F,
Displace C,
Displace G, CIRCULAR DEPENDENCE!
5.7.2
Cuckoo Hashing
Fortunately if the tables load factor is below 0.5,
the probability of a cycle is very low.
If circular dependence really occurs, we can
simply rebuild the tables with new hash funcJons
aner a certain number of displacements are
detected.
5.7.2
Cuckoo Hashing
Cuckoo Hash Table ImplementaJon
Allow an arbitrary number of hash funcJons
Use a single array that is addressed by all the hash
funcJons (instead of two separately addressable hash
tables)
Specify the maximum load to be 0.4 (auto expansion if
higher load)
Specify how many rehashes we will perform
5.7.2
Hopscotch Hashing
Hopscotch Hashing: bound the maximal length of
the probe sequence by a predetermined constant
that is opJmized to the underlying computers
architecture. For example: MAX_DIST = 4.
This gives constant-Jme lookups in the worst
case.
The lookup could be parallelized to simltaneously
check the bounded set of possible locaJons.
5.7.3
Hopscotch Hashing
The hops tell which of the posiJons in the block are
occupied with cells containing this hash value. Thus
Hop[8] = 0010 indicates that only posiJon 10
currently contains items whose hash value is 8, while
posiJons 8, 9, and 11 do not
5.7.3
Hopscotch Hashing
A6empJng to insert H. Linear probing suggests
locaJon 13, but that is too far, so we evict G from
posiJon 11 to nd a closer posiJon
5.7.3
Hopscotch Hashing
A6empJng to insert I. Linear probing suggests locaJon 14, but
that is too far; consulJng Hop[11], we see that G can move
down, leaving posiJon 13 open. ConsulJng Hop[10] gives no
suggesJons. Hop[11] does not help either (why?), so Hop[12]
suggests moving F
5.7.3
Hopscotch Hashing
InserJon of I conJnues: Next B is evicted, and
nally we have a spot that is close enough to the
hash value and can insert I
5.7.3
Extendible Hashing
What if the full amount of data is too large to t in
memory?
Our main concern is the number of disk accesses to get a
given data item
N items to store, M items t on each disk block
Collisions will cause a number of blocks to be examined
resulJng in signicant disk read cost
When hash becomes too full, rehashing will be needed
The cost will be O(N) disk accesses
Extensible hashing:
Search: two disk accesses
InserJon: few disk accesses
5.9
Extendible Hashing
Original Data
5.9
Extendible Hashing
Aner inserJon of 100100 and directory split
5.9
Extendible Hashing
Aner inserJon of 000000 and leaf split
5.9
6. Priority Queues
Dr. Angus Yeung
Course Structure
Founda'on
Reinforcement
A6end lectures
Read book chapters

Integra'on

Study for exams
IntroducJon to CS146
Algorithm Alnalysis
Assignment 1
Assignment 2
Trees
Hashing
Assignment 3
Priority Queues
Mid-Term
SorJng
Assignment 4
Graph Algorithms
Final Exam

Book-Keeping
Download of Textbook Source Code
Hashing (Chapter 5)
Hash Tables with Worst-Case O(1) Access
Hopscotch Hashing
Extendible Hashing
Priority Queues (Heaps) (Chapter 6)

Model
Simple ImplementaJon
Binary Heap
WEEK #7
LECTURE #14
Textbook Source Code

You may download the textbooks source code
here:
h6p://users.cis.u.edu/~weiss/dsaajava3/code/
Prime Number Checking

We discussed this method yesterday:
This is a short cut.
Prime Number Checking

The Sieve of Eratosthenes
Finding all prime numbers up to any given limit by
iteraJvely marking as the mulJples of each prime.
A non-prime must be a composite: prime x a_number
2
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
11
13
17
19
23
29
HASHING
Hopscotch Hashing
Hopscotch Hashing: bound the maximal length of
the probe sequence by a predetermined constant
that is opJmized to the underlying computers
architecture. For example: MAX_DIST = 4.
This gives constant-Jme lookups in the worst
case.
The lookup could be parallelized to simltaneously
check the bounded set of possible locaJons.
5.7.3
Hopscotch Hashing
The hops tell which of the posiJons in the block are
occupied with cells containing this hash value. Thus
Hop[8] = 0010 indicates that only posiJon 10
currently contains items whose hash value is 8, while
posiJons 8, 9, and 11 do not
0
0
1
0
5.7.3
Hopscotch Hashing
A6empJng to insert H. Linear probing suggests
locaJon 13, but that is too far, so we evict G from
posiJon 11 to nd a closer posiJon
5.7.3
Hopscotch Hashing
A6empJng to insert I. Linear probing suggests locaJon 14, but
that is too far; consulJng Hop[11], we see that G can move
down, leaving posiJon 13 open. ConsulJng Hop[10] gives no
suggesJons. Hop[11] does not help either (why?), so Hop[12]
suggests moving F
5.7.3
Hopscotch Hashing
InserJon of I conJnues: Next B is evicted, and
nally we have a spot that is close enough to the
hash value and can insert I
5.7.3
Extendible Hashing
What if the full amount of data is too large to t in
memory?
Our main concern is the number of disk accesses to get a
given data item
N items to store, M items t on each disk block
Collisions will cause a number of blocks to be examined
resulJng in signicant disk read cost
When hash becomes too full, rehashing will be needed
The cost will be O(N) disk accesses
Extensible hashing:
Search: two disk accesses
InserJon: few disk accesses
5.9
Extendible Hashing
Original Data
5.9
Extendible Hashing
Aner inserJon of 100100 and directory split
5.9
Extendible Hashing
Aner inserJon of 000000 and leaf split
5.9
PRIORITY QUEUES
Chapter Overview
Priority queue are used in many applicaJons:
Queue for the print jobs sent to a printer
1-page jobs should be prioriJzed over 100-page job
OperaJng system scheduler for mulJuser environment

Short jobs should nish as fast as possible, taking
precedence over jobs that have already been running
In this chapter, we will discuss:

Ecient implementaJon of the priority queue ADT
Users of priority queues
Advanced implementaJons of priority queues
6.0
Model
Two main operaJons for a priority queue:
insert the equivalent of enqueue operaJon
deleteMin equivalent of dequeue operaJon
Basic model of a priority queue:
6.1
Simple ImplementaJons
There are several obvious ways to implement a
priority queue:
Linked List
O(1) for inserJons at the front

O(N) for traversing the list to delete the minimum
Sorted Linked List
O(N) for inserJons

O(1) for deleJng the minimum
Binary Search Tree (BST)
O(log N) for inserJons

O(log N) for deleJng the minimum
RepeaJng deleJng the minimum will hurt the balance of the tree
6.2
Binary Heap
A binary heap (or just heap) is a binary tree that is complete.
All levels of the tree are full except possibly for the bo6om level
which is lled from len to right.
6.3
Binary Heap
Conceptually, a heap is a binary tree.
But we can implement it as an array.
For any element in array posiJon i:
Len child is at posiJon 2i
Right child is at posiJon 2i + 1
Parent is at posiJon i / 2
6.3
Heap-Order Priority
We want to nd the minimum value (highest
priority) very quickly.
Make the minimum value always at the root.
Apply this rule also to roots of subtrees.
Weaker rule than for a binary search tree.
Not necessary that values in the len subtree be less
than the root value and values in the right subtree be
greater than the root value.
6.3
Heap-Order Priority
Two complete trees (only the len tree is a heap)
6.3
Heap InserJon
InserJon Strategy Percolate up:
Repeatedly do a heap inserJon on the list of values
Percolate up the hole each Jme from the bo6om of the heap.
InserJng 14
6.3
Heap InserJon
Procedure to insert into a binary heap:
6.3
Delete the Minimum

deleteMin Strategy Percolate down:
Finding the minimum is easy; the hard pard is removing it.

A hole is created at root when the minimum is removed.
Slide the smaller of the holes children into the hole.
6.3
Delete the Minimum

Method to perform deleteMin in a binary heap
6.3
buildHeap
Each dashed line in the gures corresponds to two comparisons
during a call to percolateDown():
One to nd the smaller child.
One to compare the smaller child to the node.
For the bound of the running Jme of buildHeap(), we must

bound the number of dashed lines.
Each call to percolate down a node can possibly go all the way down to the
bo6om of the heap.
There could be as many dashed lines from a node as the height of the node.
The maximum number of dashed lines is the sum of the heights of all the
nodes in the heap.
Prove that this sum is O(N).
6.4
Building Heap
Sketch of buildHeap
6.4
Building Heap
IniJal Heap and aner percolateDown(7)
6.4
Building Heap
percolateDown(6) and percolateDown(5)
6.4
Building Heap
6.4
Building Heap
6.4
Running Time for buildHeap

Prove: For a perfect binary tree of height h containing 2h+1 1
nodes, the sum of the heights of the nodes is
2h+1 1 (h+1)
There is one node (the root) at height h, 2 nodes at height h-1, 4 nodes at
height h-2, and in general, 2i nodes at height h-i.
h
S = 2i ( h i )
i =0
= h + 2(h 1) + 4(h 2) + 8(h 3) + ... + 2h1 (1)
(a)
MulJply both sides by 2:
2S = 2h + 4(h 1) + 8(h 2) + 16(h 3) + ... + 2h (1)
(b)
Subtract (b) (a). Note that 2h 2(h-1) = 2, 4(h-1)-4(h-2) = 4, etc.
S = h + 2 + 4 + 8 + ... + 2h 1 + 2h
= (2h+1 1) (h + 1)
6.4
Running Time for buildHeap

S = (2h+1 1) (h + 1)
A complete tree is not necessarily a perfect binary tree, but it
contains between N = 2h and 2h+1 nodes. Therefore, S = O(N).
And so buildHeap() runs in O(N) Jme.

6.4

Book-Keeping
Assignment #3 available on Canvas
Priority Queues (Heaps) (Chapter 6)

Lenist heaps
Skew Heaps
Binomial Queues
WEEK #8
LECTURE #15
Lenist Heaps
A lenist heap is a heap that supports ecient
merging.
A node inserJon into a lenist heap is a merger with a
one-node tree.
A node deleJon of the root splits a lenist tree into two
trees which are then merged back together.
A lenist heap tends to be unbalanced.
6.6
Null Path Length

Null path length npl(X) of any node X is the
length of the shortest path from X to a
node that has only 0 or 1 child.
The npl of a node with 0 or 1 child is 0.
The npl(null) = -1.
The lenist heap property:

For every node X, the npl(left child) npl(right child).
6.6
Null Path Length

Null path lengths for two trees; only the len tree
is lenist.
6.6
Lenist Heap Merger

To merge two lenist heaps H1 and H2:
If either heap is empty, return the other one.
Otherwise, compare their roots and recursively
merge the heap with the larger root with the
right subheap of the heap with the smaller root.
Make the new heap the right child of the
original heap with the smaller root.
Swap the len and right children if necessary
to maintain the lenist property.
6.6
Lenist Heap Merger

Two lenist heaps H1 and H2.
6.6
Lenist Heap Merger

Result of merging H2 with H1s right subheap.
6.6
Lenist Heap Merger

Result of merging H2 with H1s right subheap.
6.6
Lenist Heap Merger

Result of a6aching lenist heap of previous gure
as H1s right child.
6.6
Lenist Heap Merger

Result of swapping children of H1s root.
6.6
Lenist Heap Merger

Result of merging right paths of H1 and H2.
6.6
ImplementaJon of Lenist Heap

An inserJon of a new value is a merger with a
single node.
6.6

A deleJon (of the root) splits the heap into two
parts which are then merged back together.
6.6

Driving rouJnes for merging lenist heaps
6.6

Actual rouJne to merge lenist heaps
6.6
Skew Heaps
Skew Heaps:
Self-adjusJng version of a lenist heap
Simple to implement
Are binary trees with heap order but without structural
constraint on these trees
Unlike Lenist Heaps, Skew Heaps always
Contain no NPL informaJon of any node
Perform uncondiJonal swap
Lenist heaps: will check to see whether the len and right children saJsfy the
lenist heap structure property and swap them if they do not.
Skew Heaps: one excepJon the largest of all the nodes on the right paths
does not hasve its children swapped.
6.7
Skew Heaps
Two skew heaps H1 and H2:
6.7
Skew Heaps
Result of merging H2 with H1s right subheap
6.7
Skew Heaps
Result of merging skew heaps H1 and H2
6.7
Binomial Queues
Binomial Queues:
Similar to Lenist and Skew Heaps in supporJng merging,
inserJon and deletemin operaJons.
Similar to Lenist and Skew Heaps in having O(log N) worst-case
Jme per operaJon for those three operaJons.
But inserJons take only constant Jme on average.
6.8
Binomial Queues
A binomial queue is a collecJon of heap-trees, known as a forest.
Binomial Trees B0, B1, B2, B3, and B4.
6.8
Binomial Queue Merging

Two binomial queues H1 and H2.
6.8

Merge of the two B1 trees in H1 and H2
6.8

Binamial queue H3: the result of merging H1 and
H2
6.8
Binomial Queue InserJon

InserJng 1 through 7 in order:
6.8
Binomial Queue deleteMin

Performing a deleteMin on H3:
6.8

Binomial queue H: B3 with 12 removed:
6.8

Result of applying deleteMin to H3:
6.8

Binomial queue H3 drawn as a forest:
6.8
Binomial Queue ImplementaJon

The binomial queue is an array of binomial trees
arranged in decreasing rank:
6.8
7. Sor'ng
Dr. Angus Yeung
Course Structure
Founda'on
Reinforcement
A3end lectures
Read book chapters

Integra'on

Study for exams
Introduc'on to CS146
Algorithm Alnalysis
Assignment 1
Assignment 2
Trees
Hashing
Assignment 3
Priority Queues
Mid-Term
Sor'ng
Assignment 4
Graph Algorithms
Final Exam

Book-Keeping
Assignment #3
Quiz #3: 4/1/15
Mid-term: 4/6/15
Sor'ng (Chapter 7)
Inser'on Sort
Shellsort
Heapsort
Mergesort
WEEK #9
LECTURE #17
Inser'on Sort
No. of input, N = 6
Inser'on sort consists

of N-1 or 5 passes.
Posi'ons 0 through
p-1 are already sorted
Posi'ons p
7.2
Implementa'on of Inser'on Sort

In our implementa'on of Inser'on Sort, we are using comparison-based sor'ng.
The objects being sorted are

of type Comparable.
The element at posi'on p is

stored in tmp.
7.2
Analysis of Inser'on Sort

The number of tests in the inner loop is at most p
+ 1 'mes for each value of p. Summing over all p
gives a total of
If the input is pre-sorted,

the running 'me is O(N).
7.2
A Lower Bound for Simple Sor'ng Algorithms
An Inversion in an array of numbers in any ordered

pair (i,j) having the property that i < j but a[i] > a[j]
The two values are out of order.
Assume were sor'ng from lowest to highest.
For example: [34, 8, 64, 51, 32, 21] had 9 inversion
(34, 8), (34, 32), (34, 21), (64, 51), (64,32), (64, 21), (51,
32), (51, 21), and (32, 21).
Number of Inversions is exactly the number of swaps

that needed to be performed by inser'on sort.
A sorted array has no inversions.
7.3

Theorem 7.1 The average number of inversions in an
array of N dis'nct elements is N(N-1)/4.
Proof:
Consider L, list of elements, and Lr, list of elements in reversed order.
Any pair (x, y) represents an inversion in either L or Lr.
Since there are N(N-1)/2 pairs, the average list has half this amount, or
N(N-1)/4 inversions.
Theorem 7.2 Any algorithm that sorts by exchanging

adjacent elements requires (N2) 'me on average.
Proof:
The average number of inversions is N(N-1)/4 or (N2).
Each swap removes only one inversion, so (N2) swaps are required.
7.3
The lower-bound proof is valid not only for

Inser'on Sort but also other simple algorithms
such as bubble sort and selec'on sort.
A sor'ng algorithm makes progress by elimina'ng
inversions, and to run eciently, it must eliminate
more than just one inversion per exchange.
7.3
Shellsort
Other notes on inser'on sort
Inser'on sort is fast if the array is nearly sorted
Parallellism? If we can swap non-adjacent values, we may
be able to remove more than one inversion at a 'me.
If we can get the array nearly sorted as soon as possible,
inser'on sort can nish the job quickly.
Donald Shell invented the Shellsort algorithm in 1959

based on these observa'ons.
Shellsort was one of the rst algorithms to break the
quadra'c 'me barrier.
7.4
Shellsort
Basic principle:
We start by comparing elements that are distant
The distance, h, between comparisons decreases as
the algorithms runs un'l the last phase, in which
adjacent elements are compared.
This is referred to as diminishing increment sort.
7.4
Shellsort
Shellsort uses a sequence, h1, h2, , ht, called the
increment sequence.
Like Inser'on Sort, except that we compare values
that are h elements apart in the list: a[i] a[i+hk]
hk diminishes awer comple'ng a pass, e.g., 5, 3, and 1.
The le is said to be hk-sorted. For example, 5-sorted,
3-sorted, etc.
The nal value, h1, must be 1. So the nal pass is
always a regular Inser'on Sort.
7.4
A Shellsort Example
Shellsort awer each pass, using [1, 3, 5] as the increment
sequence.

The choice of h values aects how long the sort takes.
7.4
Implementa'on of Shellsort
Shellsort rou'ne using Shells increments (be3er
increments are possible)
Shells increments:
ht=[N/2] and hk=[hk+1/2]
ht=[N/2]
hk=[hk+1/2]
7.4
Worst-Case Analysis of Shellsort

Shellsort is dicult to analyze
Theorem 7.3 The worst-case running 'me of
Shellsort, using Shells increments, is (N2).
Proof:
provide cases for both O(N2) and (N2).

See textbook page 276.
Theorem 7.4 The worst-case running 'me of

Shellsort using Hibbards increments is (N3/2).
Proof:
see textbook page 277.
Hibbards increments: 1, 3, 7,,2k-1

7.4
Inser'on Sort versus Shellsort

Inser'on sort is simple to implement but the average
running 'me is slow at (N2)
It swaps only adjacent values
An element may traverse a long way through the array
during a pass to arrive at its proper place.
Shellsort is also simple to implement but the average

running 'me is much improved at (N3/2)
Early passes with large h make it easier for later passes
with smaller h to sort
The choice of a good increment sequence for h is
important.
Heapsort
Heapsort is based on using a priority queue with running 'me at O(N log N)
To sort N values into increasing order:
Build a heap: running 'me = O(N)
Remove N dele'ons: O(log N)
Sorted values can be appended to the end of underlying array
Element removed will be

appended to the end.
7.5
Mergesort
Mergesort uses the strategy of Divide and Conquer
Divide: split the list of values into two halves and
recursively sort each half.
Conquer: merge the two sorted halves back
7.6
Mergesort Ch.2 Example

We discussed Divide and Conquer
strategy in Max Subsequence Sum
Problem in Chapter 2: Algorithm 3
Mergesort Illustrated
The basic merging algorithm takes two input
arrays A and B, an output array C, and three
counters, Actr, Bctr, and Cctr
7.6
If the array A contains 1, 13, 24, 26, and B contains 2, 15,
27, 38, then the algorithm proceeds as follows:
First, a comparison is done between 1 and 2.
1 is added to C, and then 13 and 2 are compared.
7.6
2 is added to C, and then 13 and 15 are compared.
7.6
13 is added to C, and then 24 and 15 are compared. This
proceeds un'l 26 and 27 are compared.
7.6
26 is added to C, and the A array is exhausted.
7.6
The remainder of the B array is then copied to C.
7.6
Analysis of Mergesort
What is the running 'me for Mergesort?
Let T(N) be the 'me to sort N values.
For N=1, the 'me to mergesort is constant, O(1)
Otherwise, it takes T(N/2) for recursive mergesort, and
N to do the merge
We have a recurrence rela'on:
7.6
Analysis of Mergesort
1
Divide both sides by N
3
2
7.6

Book-Keeping
Review of Quiz #3
Review of Mid-term
Review of Assignment #3
Sor'ng (Chapter 7)
Quicksort
Picking the Pivot
Par''oning Strategy
WEEK #10
LECTURE #20
Quicksort
Quicksort is one of the most elegant and useful algorithms
in computer science.
A fast divide-and-conquer recursive algorithm
Very 'ght and highly op'mized inner loop
Performance
Average running 'me is O(N log N)
Worst-case performance is O(N2)
Basic idea:
Find a good pivot value in a list
Recursively sort the two sublists
Similar to mergesort but does not require merging or a temp array.
7.7
Quicksort: Algorithm
Simple recursive sor'ng algorithm
Input is divided into three sublists:
smaller, same, and larger
This is the pivot item.
Recursive call on smaller

and larger sublists
7.7
Quicksort: Example
1. Select Pivot
2. Par''on
3. Recursive Sort
7.7
Why is Quicksort faster?

1. Select Pivot
2. Par''on
Performed in place and very ecient
3. Recursive Sort
Sublists may not be equal size.
7.7
Quicksort: Picking the Pivot

Use the rst element as the pivot
Acceptable: if the input is random
Poor Result: if the input is presorted or in reverse order
A safe Maneuver: choose the pivot randomly

But random number genera'on is an expensive opera'on
Median-of-Three Par''oning
Median of the array is hard to calculate
Alterna've: pick three elements randomly and use the median
of these three as the pivot.
Be3er for implementa'on: pick lew, right and center elements
Result: Reduce the number of comparisons by 14%
7.7.1
Quicksort: Par''oning Strategy

Here is one par''oning strategy used in prac'ce:
1. Move Pivot to the last element
3. Swap if needed

2. Move j forward
5. Move pivot to the middle, at i
4. Stop when
i & j cross over
7.7.2

Book-Keeping
Sor'ng (Chapter 7)
WEEK #11
LECTURE #21
7.7 Quicksort
7.7.5 Analysis of QuickSort
7.7.6 A Linear-Expected-Time Algorithm for Selec'on
7.8 A General Lower Bound for Sor'ng

7.11 Linear-Time Sorts: Bucket Sort and Radix Sort
7.12 External Sor'ng
Quicksort: Analysis
What is the running 'me to quicksort a list of N?
Par''on the array into two subarrays
(constant cN 'me).
A recursive call on each subarray.
A recurrence rela'on:
where i is the number of values in the lew par''on.
The performance of quicksort is highly dependent on ...

... the quality of the choice of pivot.

7.7.5
Quicksort: Worst-Case Analysis
The pivot is always the smallest value of the par''on, and so i = 0.
7.7.5
Quicksort: Best-Case Analysis

The pivot is always the median. Each subarray is the same size.
7.7.5
Quicksort: Avg-Case Analysis

Each size for a subarray awer par''oning is equally likely, with probability 1/N:
1
T ( N i 1) =
N
Therefore:
N 1
T ( j )
j =0
2 N 1
T ( N ) = T ( j ) + cN
N j =0
N 1
NT ( N ) = 2 T ( j ) + cN 2
(a)
j =0
N 2
( N 1)T ( N 1) = 2 T ( j ) + c( N 1)2 (b)

j =0
Subtract (a) (b):
NT ( N ) ( N 1)T ( N 1) = 2T ( N 1) + 2cN c
7.7.5

NT ( N ) ( N 1)T ( N 1) = 2T ( N 1) + 2cN c
Rearrange and drop the insignicant c:
NT ( N ) = ( N + 1)T ( N 1) + 2cN
Divide through by N(N+1):
T ( N ) T ( N 1)
2c
=
+
N +1
N
N +1
Telescope:
T ( N 1) T ( N 2) 2c
=
+
N
N 1
N
T ( N 2) T ( N 3)
2c
=
+
N 1
N 2
N 1
T (2) T (1) 2c
=
+
3
2
3
7.7.5

Add and cancel:
N +1
T ( N ) T (1)
1
=
+ 2c
N +1
2
i =3 i
N +1
1
Recall the harmonic number: loge N
i =3 i
And so:
Therefore:
T (N )
= O (log N )
N +1
T ( N ) = O( N log N )
7.7.5
A General Lower Bound for Sor'ng

For any sor'ng algorithm that uses only
comparisons, O(N log N) is as good as we can do.
Mergesort and Heapsort are op'ma to within a
constant factor.
How can we prove that?
7.8

Prove: Any sor'ng algorithm that uses only
comparisons requires log(N
! ) comparisons in the
worst case and log(N!) comparisons on average.
We can use the decision trees for the proof.
Log(N!) = (N log N)
7.8

A decision tree for three-element sort
1. Worst case:Depth of the deepest leaf
2. Avg case: Avg depth of the leaves
7.8

Lemma 7.1: Let T be a binary tree of depth d. Then T
has at most 2d leaves.
Lemma 7.2: A binary tree with L leaves must have
depth at least ceiling(log L).
Theorem 7.6: Any sor'ng algorithm that uses only
comparisons between elements requires at least
ceiling(log(N!)) comparisons in the worst case.
Theorem 7.7: Any sor'ng algorithm that uses only
comparisons between elements requires (N log N)
comparisons. => Log(N!) = (N log N)
7.8
Linear-Time Sorts: Bucket Sort and Radix Sort
For general sor'ng algorithm that uses only

comparisons requires (N log N) 'me in the worst
case.
But it is s'll possible to sort in linear 'me in some
special cases when extra informa'on is available.
Two special cases:
Bucket sort: input is posi've integers smaller than M
Radix Sort: assume input is only small integers
7.11
Bucket Sort
Input: A1, A2,, AN, posi've integers < M.
Bucket Sort Algorithm:
Keep an array called count, of size M ini'alized with
all 0s.
When Ai is read, increment count[Ai] by 1.
Awer all the input is read, scan the count array,
prin'ng out a representa'on of the sorted list.
Note: count has M cells, or buckets.

Algorithm takes O(M + N)
7.11
Radix Sort
Radix sort is some'mes known as card sort. It
was used by the old electromechanical IBM card
sorters to sort punched cards.

7.11
Radix Sort
Input: 10 numbers in the range 0 to 999.
Principle: Too many buckets -> bucket sort not so
useful here. How about use several passes of
bucket sort?
Perform bucket sorts in the reverse order, star'ng
with the least signicant digit rst.
7.11
Radix Sort: Example

For eample, input sequence is 64, 8, 216, 512, 27,
729, 0, 1, 343, 125
Running 'me: O(p(N+b)), p = #passes, N=#input,
b=#buckets
7.11
External Sor'ng
Internal sor'ng algorithms take advantage of the
fact that memory is directly addressable.
External sor'ng algorithms are designed to handle
very large inputs. The input is much too large to t
into memory.
Some'mes the 'me it takes to read the input is
signicant compared to the 'me to sort the input.
Even though sor'ng is an O(N log N) opera'on and
reading the input is only O(N). In realty, reading
the input is much larger than O(N).

7.12
Model for External Sor'ng

For external sor'ng, we consider work on tapes,
which are probably the most restric've storage
medium.
Tapes can be eciently accessed only in sequen'al
order (in either direc'on).
In our model, we assume to have at least three
tape drives to perform the sor'ng. Two drivers for
ecient sor'ng; the third drive simplies ma3ers.
If only one tap drive can be used, any algorithm will
require (N2) tap accesses.

7.12
External Sor'ng: The Simple Algorithm

M=3
Assume that the internal memory can hold and sort M records at a
'me. So M records are read at a 'me from the input tape.
We use four tapes: two input
and two output taps
Merge
Each set of sorted records is called a run.

Awer this is done, we rewind all the tapes.
7.12

Merged
Rewind all four tapes and repeat the same steps.
Con'nue the process un'l we get one run of length N.
7.12

The Simple Algorithm for External Sor'ng requires
ceiling(log(N/M)) passes, plus run-construc'ng
pass.
For example: 10 million records of 128 bytes each,
4 MB of internal memory, then the rst pass will
create 320 runs. We would then need 9 more
passes to complete the sort.
7.12
External Sor'ng: Mul'way Merge

Mul'way Merge: If we have extra tapes, then we
can expect to reduce the number of passes
required to sort our inputs.
The number of passes required using k-way
merging is ceiling(logk(N/M)).

7.12
External Sor'ng: Mul'way Merge

We then need two more passes of three-way merging to complete the sort.

Ceiling(log3(13/3)) = 2
7.12
External Sor'ng: Polyphase Merge

The k-way merging strategy requires the use of 2k
tapes. It is possible to get by with only k+1 tapes.
Split the number of runs into two Fibonacci numbers

FN-1 and FN-2 (below).
1, 1, 2, 3, 4, 8, 13, 21, 34, 55, 89, 144,

7.12
External Sor'ng: Replacement Selec'on

Replacement selec'on: As soon as the rst record
is wri3en to an output tape, the memory it used
becomes available for another record.
Ini'ally, M records are read into memory and
placed in a priority queue.
We perform a deleteMin, wri'ng the smallest
record to the output tape

7.12
External Sor'ng: Replacement Selec'on

.

7.12
8. The Disjoint Set Class
Dr. Angus Yeung
Course Structure
Founda'on
Reinforcement
A8end lectures
Read book chapters

Integra'on

Study for exams
IntroducKon to CS146
Algorithm Alnalysis
Assignment 1
Assignment 2
Trees
Hashing
Assignment 3
Priority Queues
Mid-Term
SorKng
Assignment 4
Graph Algorithms
Final Exam

Book-Keeping
Ch.8 The Disjoint Set Class
Equivalence RelaKons
The Dynamic Equivalence Problem
Basic Data Structure
Smart Union Algorithms

WEEK #11
LECTURE #22
Reminder: Academic Integrity
0.67% ??!!!!
Wearable CompuKng
Interested in wearable compuKng projects?
Android programming
Android L
Android Wear
Google Fit
iOS Programming
Swid
ANCS
Arduino
Intel Edison Development Kit

Node.js
C Programming
Bluetooth Low Energy (BLE) Protocols
IntroducKon
The Disjoint Set Class is an ecient data structure to
solve the equivalence problem.
Data structure is simple to implement
ImplementaKon is extremely fast
Analysis is extremely dicult
In this chapter, we will

Show relevant implementaKon
Increase its speed, using just two simple observaKons
Analyze the running Kme
See a simple applicaKon
8.0
Equivalence RelaKons
Dene a relaKon R on members of a set S:
For each pair of elements (a, b), where a and b are in
S, a R b is either true or false.
If a R b is true, then a is related to b.
An equivalence relaKon R saKses three

properKes
Reexive: a R a for all a in S.
Symmetric: a R b if and only if b R a.
TransiKve: If a R b and b R c then a R c.
8.1
Example for Equivalence

Electrical connecKvity all connecKons are by metal
wires
Reexive:
a R a for all a in S.
any component is connected to itself.
Symmetric:
a R b if and only if b R a.
If a is electrically connected to b, then b must be electrically
connected to a.
TransiKve:
If a R b and b R c then a R c.
If a is connected to b and b is connected to c, then a is connected
to c.
8.1
Example for Equivalence

Two ciKes in the same country both ciKes are
connected by roads.
Reexive:
a R a for all a in S.
any city is connected to itself.
Symmetric:
a R b if and only if b R a.
If it is possible to travel from city a to city b by roads, then it is
also possible to travel from city b to city a by roads.
TransiKve:
If a R b and b R c then a R c.
If it is possible to travel from city a to city b and from city b to city
c, then it is possible to travel from city a to city c.
8.1

If we use ~ to denote an equivalence relaKon, the problem
statement is then, for any a and b, if a ~ b.
Example:
Given the set: {a1, a2, a3, a4, a5}

There are 25 pairs of elements, either related or not.
The informaKon: a1~a2, a3~a4, a5~a1, a4~a2 implies that all pairs
are related.
The equivalence class of an element: a S is the subset of S

that contains all the elements that are related to a.
Every member of S appears in exactly one equivalence class.
To decide if a~b, we need only to check if a and b are in the
same equivalence class.
8.2

Disjoint Sets: Si n Sj =
If the input is iniKally a collecKon of N sets, each with one

element. The iniKal representaKon is that all relaKons
(except reexive relaKons) are false. Each set has a
dierent element that makes the sets disjoint.
Two permissible operaKons:
find -> returns the name of the set (equivalence class)
containing a given element.
add -> check if a and b are in the same equivalence class. If not,
then apply union to create a new set: Sk = Si U Sj
find(a)==find(b) is true if and only if a and b are in

the same set.
8.2
Performance for find and union

For the find operaKon to be fast, we could
maintain, in an array, the name of the equivalence
class for each element. The running Kme for nd is a
simple O(1) lookup.
For the union(a, b)operaKon, we scan down the
array, changing all the equivalence class i for a to the
equivalence class j for b. The scan takes (N2).
We want to nd a soluKon to the union/nd problem
that makes unions easy but finds hard.
8.2

A find operaKon doesnt need to return any specic
name; just that finds on two elements return the
same answer if and only if they are in the same set.
We can use a tree to represent each set.
For example, we have eight elements iniKally in
dierent sets.
8.3

Ader union(4, 5) and union(6, 7)
8.3

Ader union(4, 6)
Implicit representaKon of previous tree
8.3

A find(x) on element x is performed by
returning the root of the tree containing x.
The Kme to perform this operaKon is proporKonal
to the depth of the node represenKng x.
The worst case is to have a tree of depth N-1, then
the worst case running Kme of a find is (N).
8.3
8.3
8.3

Book-Keeping
Revised class schedule
Final Exam Schedule
Assignment #4
Ch.8 The Disjoint Set Class

An ApplicaKon
WEEK #12
LECTURE #23

Result of union-by-size if the next operaKon were
union(3, 4). For union-by-size, smaller tree
becomes a subtree of the larger.
If unions are done by size, the depth of any node
is never more than log N.
8.4

Result of an arbitrary union. If we dont use
union-by-size, a deeper forest will be formed.
8.4

Worst-case tree for N=16. This happens when all
unions are between equal-sized trees.
8.4

Union-by-height: we can also keep track of the
height, instead of the size. The depth of tree is at
most O(log N).
8.4

Source code for union-by-height.
8.4
Path Compression
Problems with the union/nd algorithms
The worst case of O(M log N) for the union/find
algorithm can occur fairly easily and naturally.
If there are many more finds than unions, this
running Kme is bad.
8.5
Path Compression
Path compression is an operaKon that does
something clever on the find operaKon.
Path compression is performed during a find
operaKon and is independent of the strategy used
to perform unions.
Suppose the operaKon is nd(x). Then the eect
of path compression is that every node on the
path from x to the root has its parent changed to
the root.
8.5
Path Compression
An example of path compression ader nd(14) on the
generic worst tree. Nodes 12 & 13, and Node 14 & 15 are
now closer to the root.
8.5
Path Compression
Code for the disjoint set nd with path
compression.
Thats the only change

required in path compression
8.5
An ApplicaKon
An example of the use of the union/nd data structure is the
generaKon of mazes.
8.7
An ApplicaKon
IniKal state: all walls up, all cells in their own set.
8.7
An ApplicaKon
At some point in the algorithm: Several walls down, sets have merged; of
at this point the wall between 8 and 13 is randomly selected, this wall is
not knocked down, because 8 and 13 are already connected.
8.7
An ApplicaKon
Wall between squares 18 and 13 is randomly selceted; this wall is
knocked down, because 18 and 13 are not already connected; their sets
are merged.
8.7
An ApplicaKon
Eventually, 24 walls are knocked down; all elements
are in the same set.
8.7
Wanna Build a Maze Game?

Many great references out there:
Demos of Maze generaKon Algorithms:

h8p://www.jamisbuck.org/presentaKons/rubyconf2011/
index.html
Maze GeneraKon in 3D:
h8p://totologic.blogspot.com/2013/04/maze-generaKon-
in-3d.html
Maze Tutorial (in Java):
h8p://forum.codecall.net/topic/63862-maze-tutorial/
Maze GeneraKon: Ellers Algorithm (in Ruby):
h8p://weblog.jamisbuck.org/2010/12/29/maze-
generaKon-eller-s-algorithm
Material in Backup SecKon is out of the scope for this class.
BACKUP
Slowly Growing FuncKons
Assume that f(N) is a well dened funcKon that

reduces N. For recurrence equaKon in above, we
iteraKvely apply f(N) unKl we reach 1 or less.
We call the soluKon to this equaKon f*(N).
Example: Analysis of Binary Tree
f(N) = N/2; each step halves N.
We do this at most log N Kmes unKl N reaches 1
So we have f*(N)
8.6
Dierent Values of the Iterated FuncKon
The soluKon T(N) = log*N is known as

the iterated logarithm.
8.6
Iterated Logarithm
For pracKcality, the iterated argorithm with base 2
has a value no more than 5.
Note: lg denotes binary logarithm
lg*4 = 2
8.6
An Analysis by Recursive DecomposiKon

Establishing a Kght bound on the running Kme of a sequence of
M=(N) union/find operaKons:
Lemma 8.1 When execuKng a sequence of union instrucKons, a

node of rank r>0 must have at least one child of rank 0, 1,,r-1.
Lemma 8.2 At any point in the union/nd algorithm, the ranks of the
nodes on a path from the leaf to a root increase monotonically.
8.6
ParKal Path Compression
Algorithm A is our standard sequence of union-by-rank and find with path

compression operaKons. We design an Algorithm B to perform all the union prior to
any find.
Then each nd operaKon in algorithm A is replaced by a parKal nd operaKon in
Algorithm B.
A parKal nd operaKon species the search item and the node up to which the path
compression is performed. The node that will be used is the node that would have been
the root at the Kme the matching nd was performed in algorithm A.
8.6
9. Graph Algorithms
Dr. Angus Yeung
Course Structure
Founda'on
Reinforcement
A9end lectures
Read book chapters

Integra'on

Study for exams
IntroducWon to CS146
Algorithm Alnalysis
Assignment 1
Quiz 1

Trees
Hashing
Priority Queues
SorWng
Assignment 2
Quiz 2
Quiz 3
Assignment 3
Mid-Term

Graph Algorithms
Assignment 4
Algorithm Design Techniques

This Lecture
Quiz 4
Final Exam

Book-Keeping
Ch.9 Graph Algorithms
IntroducWon
DeniWons
Topological Sort
Shortest-Path Algorithms

WEEK #12
LECTURE #24
IntroducWon
We are going to discuss several common problems in
graph theory.
In many applicaWons, they are too slow unless we pay
a9enWon to the choice of data structure.
In this chapter, we will
Show the conversion of real-life problems to problems on

graphs
Give algorithms to solve several commone graph problems
Show how the proper choice of data structures can
drasWcally reduce the running Wme of these algorithms
Learn depth-rst search and show how it can be used to
solve several seemingly nontrivial problems in linear Wme.
9.0
Graph Theory
In computer science, graph theory is the study of
graphs, which are mathemaWcal structures used
to model pairwise relaWons between objects.
Seven Bridges of Knigsberg

The problem was to nd a walk through the city that
would cross each bridge once and only once.
The islands could not be reached
by any route other than the
bridges, and every bridge must
have been crossed completely
every Wme; one could not walk
halfway onto the bridge and then
turn around and later cross the
other half from the other side.
Leonhard Euler proved that the
problem has no soluWon. There
could be no non-retracing the
bridges.
Seven Bridges of Knigsberg

Euler pointed out that the choice of route inside each land
mass is irrelevant.
The only important feature of a route is the sequence of
bridges crossed. That allowed him to reformualte the
problem in abstract terms.
The resulWng mathemaWcal structure is called a graph.
ApplicaWons for Graphs

As one of the most versaWle data structures in
computer science, graph nd its presence in many
applicaWons:
RepresentaWon of a map of locaWons and distances
between them;
State transiWons in computer algorithms
RelaWonships such as family trees, business and
military organizaWons, etc.
ConnecWvity in computer and communicaWons
networks.
Social Network Diagram

A social network
diagram displaying
friendship Wes
among a set of
facebook users.
Graph, VerWces and Edges

A graph G = (V, E) is a set of verWces V and a set of
edges (or arcs) E.
An edge is a pair (v, w), where v and w are in V.
If the pair is ordered, the graph is directed and is called a
digraph.
Vertex w is adjacent to vertex v if and only if (v, w) is in E
In an undirected graph, both (v, w) and (w, v) are in E.
v is adjacent to w, and w is adjacent to v.
An edge can have a weight or cost component.

9.1
Planning a Road Trip

Round trip: ~2,000 miles
Path
A path is a sequence of verWces w1, w2, w3, ..., wN
where (wi, wi+1) is in E, for 1 i < N.
The length is the path is the number of edges on the
path.
A simple path has all disWnct verWces, except that the
rst and last can be the same.
9.1
Cycle
A cycle in a directed graph is a path of length 1
where w1 = wN.
A directed graph with no cycles is acyclic.
A DAG is a directed acyclic graph.
9.1
More on DeniWons
An undirected graph is connected if there is a path
from every vertex to every other vertex.
A directed graph with this property is strongly
connected.
A directed graph is weakly connected if it is not
strongly connected but the underlying undirected
graph is connected.
A complete graph has an edge between every pair

of verWces.

9.1
RepresentaWon of Graphs
Represent a directed graph with an adjacency list.
For each vertex, keep a list of all adjacent verWces.
9.1
Topological Sort
A topological sort is an ordering of verWces in a
directed acyclic graph, such that if there is a path
from vi to vj, then vi comes before vj in the
ordering.

9.2
Topological Sort
We can use a graph to represent the prerequisites
in a course of study.
A directed edge
from Course A to
Course B means
that Course A is
a prerequisite
for Course B.
9.2
Topological Sort
Topological sort example using a queue.
Start with vertex v1.
On each pass, remove the verWces with indegree = 0.
Subtract 1 from the indegree of the adjacent verWces.
The indegree of a vertex v is the number of edges (u, v).
9.2
Topological Sort
Result of applying topological sort to the graph
A vertex is put on the queue as soon as its indegree falls to 0.
9.2
Pseudo-code for Topological Sort
Running Wme = O(|E|+|V|)
9.2

Unweighted Shortest Paths
Dijkstras Algorithm
Graphs with NegaWve Edge Costs
Acyclic Graphs
Book-Keeping
Slide deck uploaded to Canvas
Midterm grades
WEEK #13
LECTURE #25
Assume there is a cost associated with each edge.

The cost of a path is the sum of the cost of each edge on
the path.
We consider the weighted path length in our problem.
9.3
Find the least-cost path from a disWnguished
vertex s to every other vertex in the graph.
Shortest Weighted Path from V1 to V6 has a cost of 6.
9.3
A graph with a negaWve-cost .
The shortest path from from v5 to v4 is undened.
NegaWve-Cost Cycle
9.3
We are going to examine algorithms to solve
four versions of the shortest path problem.
Solve the unweighted shortest-path problem.
Solve the weighted shortest-path problem if there are no
negaWve edges.
Solve the weighted shortest-path problem if the graph has
negaWve edges.
Solve the weighted problem for the special case of acyclic
graphs in linear Wme.
9.3
Unweighted Shortest-Path
An unweighted directed graph G
Unweighted shortest path is clearly a special case of the
weighted shortest path problem, since we could assign
all edges a weight of 1.
9.3.1
Breadth-First Strategy
Breadth First Search: Processing verWces in layers.
The verWces closest to the start are evaluated rst,
and the most distant verWces are evaluated last.
Graph aper marking the start node as reachable in
zero edges
9.3.1
Graph aper nding all verWces whose path
length from s is 1.
9.3.1
Graph aper nding all verWces whose shortest
path is 2
9.3.1
Final shortest paths
9.3.1
TranslaWng Strategy into Code

IniWal conguraWon of table used in unweighted
shortest-path computaWon
Keep the tentaWve distance from
vertex v3 to another vertex in dv.
Keep track of the path in pv.
A vertex becomes known aper it
has been processed.
No cheaper path can be found.
Start with 0 as the current

distance.
9.3.1
TranslaWng Strategy into Code

During each iteraWon,
process an unknown
vertex v whose distance
dv = current distance.
Mark v as known.
For each vertex w
adjacent to v:
5
2
Set its distance dw to the

current distance + 1
Set its path pw to v.
3
4
1
Increment the current

distance.
9.3.1
Rened algorithm using a queue
Known eld
can be discarded.
9.3.1
Rened algorithm using a queue
9.3.1
Weighted Shorted Path

Use greedy algorithm: Dijkstras
Algorithm
Greedy algorithm does what appears to be
the best at each stage;
It may not always work.
ImplementaWon:
Keep the same informaWon for each vertex;
The informaWon is either known or
unknown
TentaWve distance dv
Path informaWon pv
9.3.2
Dijkstras Algorithm
IniWal conguraWon of table used in Dijkstras
algorithm.
9.3.2
Dijkstras Algorithm
Aper v1 is declared known.
9.3.2
Dijkstras Algorithm
9.3.2
Dijkstras Algorithm
V2*:
Not updaWng V5:
(2+10) > 3
9.3.2
Dijkstras Algorithm
Aper v5 and the v3 are declared known.
V3*:
UpdaWng V6:
(3+5) < 9
V5*:
Not updaWng V7: (3+6) > 5
9.3.2
Dijkstras Algorithm
V7*:
UpdaWng V6:
(5+1) < 8
9.3.2
Dijkstras Algorithm
Aper v6 is declared known and algorithm
terminates.
9.3.2
Dijkstras Algorithm
Pseudocode for Dijkstras Algorithm
9.3.2
Graphs with NegaWve Edge Costs

Dijkstras algorithm does not
work with the graph with
negaWve edge costs.
Adding a constant to each
edge cost wont work.
A combinaWon of the
weighted and unweighted
algorithms will solve the
problem, but at the cost of an
increase in running Wme.
Not using the

concept of
known verWces
Pseudocode for weighted

shortest-path algorithm with
negaWve edge costs
9.3.3

Book-Keeping
Acyclic Graphs
Shortest-Path Example
Network Flow Problems
A Simple Maximum-Flow Algorithm
Minimum Spannign Tree

Prims Algorithm
Kruskals Algorithm
ApplicaWons of Depth-First Search

WEEK #13
LECTURE #26
Acyclic Graphs
Vertex SelecWon Rule:
If the graph is known to be acyclic, we can improve
Dijkstras algorithm by changing the order in which
verWces are declared known.
The new rule is to select verWces in topological
order.
It works because when a vertex is selected, its
distance can no longer be lowered, since by the
topological ordering rule it has no incoming edges
emanaWng from unknown nodes.
9.3.4
CriWcal Path Analysis

CriWcal Path Analysis is an applicaWon of acyclic
graph.
Like downhill skiing problem we want to get from
point a to b but can only go downhill, so clearly
there are no cycles.
9.3.4
AcWvity Node Graph

AcWvity-node graph shows the Wme it takes to
complete the acWvity of each node.
Which task can be delayed?
9.3.4
Event Node Graph

We use convert the acWvity node graph to an
event node graph to nd the compleWon Wme of
the project.
9.3.4
Event Node Graph

The earliest compleWon Wme for the node i is:
9.3.4
Event Node Graph

The latest Wme, LCi, that each event can nish
without aecWng the nal compleWon Wme:
Work backward
9.3.4
Event Node Graph

The slack Wme for each edge in the event node graph represents the
amount of Wme that the compleWon of the corresponding acWvity can
be delayed without delaying the overall compleWon.
Earliest compleWon Wme, latest compleWon Wme, and slack.
9.3.4
Shortest-Path Example
Word ladder problem. For instance: zero hero here hire fire five
9.3.6
Model for Breadth-First Search

Think of applying Breadth-First Search when we are
searching for a lost child inside a large building full of
rooms.
Here is our model:
Graph = The large building

Edge = Hallway between rooms
Vertex = Each room
How we search for the lost child:
We start in the room where we have last seen the child

We search each room adjacent to the rst room and put a tag on the
door to mark the room as serached.
Then we search each room adjacent to the rooms we have already
searched
We repeatly search all the rooms adjacent to the rooms already
searched unWl we
Model for Depth-First Search

Think of applying Depth-First Search when you are lost in a
maze:
Here is our model:
Graph = Maze
Edge = Path
Vertex = Each intersecWon in the maze
How we can get out of the maze:
Supposely we have a bag of bread crumbs

Drop bread crumbs to mark your path
Whenever we come to a dead end, we retrace our path by following our
bread crumbs
We conWnue retracing our path or backtracking to an interacWon with
unmarked path
We go down the unmarked path with the same backtracking when we hit
a dead end again
Repeat the process unWl we get out of the maze
Network Flow Problems

The problem is to determine the maximum ow of
glow that can pass from s to t.
A graph (lep) and its maximum ow:
source
The max ow is 2 + 3 = 5
sink
9.4
Nertwork Flow Problems

A cut in graph G parWWons the verWces with s and t in
dierent groups. The total edge cost across the cust is 5,
proving that a ow of 5 is maximum:
9.4

IniWal stages of the graph, ow graph, and
residual graph:
9.4.1

G, Gf, Gr aper two units of ow added along s, b,
d, t:
9.4.1

G, Gf, Gr aper two units of ow added along s, a,
c, t:
9.4.1

G, Gf, Gr aper one unit of ow added along s, a,
d, t algorithm terminates:
9.4.1

G, Gf, Gr if iniWal acWon is to add three units of ow
along s, a, d, t algorithm terminates aper one
more step with subopWmal soluWon: 3 + 1 = 4
9.4.1

Graphs aper three units of ow added along s,
a, d, t using correct algorithm
Undoing the ow
9.4.1

Graphs aper two units of ow added along s, b,
d, a, c, t using correct algorithm
Undoing the ow
9.4.1

The verWces reachable from s in the residual graph
form one side of a cut; the unreachable from the
other side of the cut.
9.4.1
Minimum Spanning Tree

Minimum Spanning Tree (MST)
Is an acyclic tree
Spans every vertex
Has |V|-1 edges
Has minimum total cost
Add each edge to a Minimum Spanning Tree in

such a way that
It does not create a cycle
Is the least cost addiWon
It is a greedy algorithm
9.5
Minimum Spanning Tree

A graph G and its minimum spanning tree:
9.5
Prims Algorithm
Prims algorithm aper each stage:
9.5.1
Prims Algorithm
IniWal conguraWon of table used in Prims
algorithm for Minimum Spanning Tree:
9.5.1
Prims Algorithm
The table aper v1 is declared known:
9.5.1
Prims Algorithm
9.5.1
Prims Algorithm
The table aper v2 and then v3 are declared
known:
9.5.1
Prims Algorithm
9.5.1
Prims Algorithm
The table aper v6 and then v5 are selected
(Prims algorithm terminates).
9.5.1
Kruskals Algorithm
Kruskals is a greedy algorithm using equivalence
classes
First parWWon the verWces into |V| equivalence classes
Process the edges in order of weight
Add an edge to the Minimum Spanning Tree and combine
two equivalence classes if the edge connects two verWces in
dierent equivalence classes
9.5.2
Kruskals Algorithm
AcWon of Kruskals algorithm on G:
9.5.2
Kruskals Algorithm
Kruskals algorithm aper each stage:
9.5.2
Kruskals Algorithm
Pseudocode for Kruskals algorithm:
9.5.2
ApplicaWons of Depth-First Search

Pseudocode for Depth-rst Search Template:
Tips for graph traversal algorithms
Similar to tree traveral: visit each vertex in a parWcular order

It may not be possible to reach all verWces from the start vertex
The graph may contain cycles and we should not go into an innite
loop
The problem can be solved by marking each vertex visited aper each
visit and avoiding revisiWng marked verWces.
9.6
Undirected Graphs
An undirected graph and depth-rst search of the graph:
1
4
5
3
9.6.1
Undirected Graphs
An undirected graph and depth-rst search of the graph:
5
2
4
3
6
10
8
7
11
Forward
Marked
already
Return
9.6.1
BioconnecWvity
A connected undirected graph is bioconnected if there are no verWces
whose removal disconnects the rest of the graph.
VerWces that are not bioconnected are called as arWculaWon points.
A graph with arWculaWon points C and D, and Depth-rst tree with
Num and Low:
1 1, special case
Low is the minimum of
Num(v)
Lowest Num(w) among all back edges
Lowest Low among all tree edges
ArWculaWon Point:
Low(child) Num
7 3
4 4
9.6.2
BioconnecWvity
Depth-rst tree that results if depth-rst search starts at C:
1 1, special case
ArWculaWon Point:
Low(child) Num
7 1
2 2
9.6.2
BioconnecWvity
RouWne to assign Num to verWces:
9.6.2
BioconnecWvity
Pseudocode to compute Low and to test for arWculaWon
points (test for the root is omi9ed):
9.6.2
BioconnecWvity
TesWng for arWculaWon points in one depth-rst search
9.6.2
Euler Circuits
A Puzzle:
Reconstruct these gures using a pen, drawing each line exactly once.
The pen may not be liped from the paper while the drawing is being
performed.
As an extra challenge, make the pen nish at the same point at which it
started.
9.6.3
Euler Circuits
Conversion of puzzle to graph:
9.6.3
Euler Circuits
Graph for Euler circuit problem:
9.6.3
Euler Circuits
Graph remaining aper 5, 4, 10, 5:
9.6.3
Euler Circuits
Graph remaining aper 5, 4, 1, 3, 7, 4, 11, 10, 7, 9, 3, 4, 10, 5:
9.6.3
Euler Circuits
Graph remaining aper 5, 4, 1, 3, 2, 8, 9, 6, 3, 7, 4, 11, 10, 7, 9,
3, 4, 10, 5
9.6.3
10. Algorithm Design Techniques
Dr. Angus Yeung
Course Structure
Founda'on
Reinforcement
A9end lectures
Read book chapters

Integra'on

Study for exams
IntroducVon to CS146
Algorithm Alnalysis
Assignment 1
Quiz 1

Trees
Hashing
Priority Queues
SorVng
Assignment 2
Quiz 2
Quiz 3
Assignment 3
Mid-Term

Graph Algorithms
This Lecture
Assignment 4
Quiz 4
Final Exam

Book-Keeping
Ch.10 Algorithm Design Techniques
IntroducVon
Greedy Algorithms
A Simple Scheduling Problem
WEEK #14
LECTURE #27
IntroducVon
Trees
Hashing
Priority Queues
We have been concerned with

the ecient implementaVon of
algorithms from Chapter 3 to 9.
SorVng
Graph Algorithms

In Chapter 10, we will discuss
the design of algorithm and see
the general approach.
10.0
ClassicaVon of Algorithm Design Techniques

Greedy
Algorithms

Human Codes
Approx. Bin Packing
Divide and
Conquer
Running Time
Closest Points Problem
SelecVon Problem
Dynamic
Programming
Using a Table Instead of Recursion

Ordering Matrix MulVplicaVons
OpVmal Binary Search Tree
All-Pairs Shortest Path
Randomized
Algorithms
Random Number Generators

Skip Lists
Primality TesVng
Backtracking
Algorithms
The Turnpike ReconstrucVon Problem

Games
10.0
Greedy Algorithms
We have already seen three greedy algorithms in
the previous chapter: Dijkstras, Prims and
Kruskals.
Greedy Algorithms always choose local opVmum
take what you can get now strategy, instead of
global opVmum.
For example: Coin Changing Problem
We repeatedly dispense the largest denominaVon.
To give out $17.61, we give out $10, $5, two $1, two
$0.5, $0.1, $0.01 bills or coins.
10.1

Suppose we have four jobs with various running
Vme, the total cost, C, of the schedule is
This sum aects the total cost.

We want this sum to be the maximum.
10.1.1

Jobs and Vmes, as well as Schedule #1.
Average compleVon Vme is (15+23+26+36)/4=25.
10.1.1

Schedule #2 (opVmal)
Average compleVon Vme = (3+11+21+36)/4=17.75
10.1.1
The MulVprocessor Case

For an exampling in the mulVprocessor case, we have 9 jobs running on 3
processors.
The total Vme is 165 for an average of 165/9=18.33.
The algorithm is to start jobs in order, cycling through processors.
10.1.1

There is a second opVmal soluVon for the mulVprocessor
case.
The total Vme to compleVon is (3 + 5 + 6 + 14 + 15 + 20 + 30
+ 34 + 38)/9=165/9=18.33.
10.1.1

We also consider the minimum nal compleVon Vme,
which is 34.
Minimizing the nal compleVon Vme is apparently much
harder than minimizing the mean compleVon Vme.
10.1.1

Book Keeping
Greedy Algorithms
Human Codes
Divide and Conquer

Closest-Points Problem
WEEK #14
LECTURE #28
Dynamic Programing
OpVmal Binary Search Tree

Greedy Algorithms
Human Codes
Divide and Conquer
WEEK #15
LECTURE #29
Dynamic Programing
Randomized Algorithms
Backtracking Algorithms
Human Codes
Human code is used in greedy algorithm for le compression.
For example, a le contains only a, e, i, s, t, spaces and newlines.
This le requires 174 bits to represent, since each character
requires three bits.
10.1.2
Human Codes
In large les, there is usually a big disparity between
the most frequent and least frequent characters.
Human Codes allow the code length to vary from
character to character and ensure the frequently
occuring characters have short codes.
Human codes is ecient in represenVng data
(resulVng in removing data redundancy).
If all the characters occur with the same frequency,
then there are not likely to be any savings.
10.1.2
Human Codes
The binary code that represents the alphabet can be
represented by the binary tree.
The representaVon of each character can be found by starVng at
the root and recording the path, using a 0 to indicate the ler
branch and a 1 to indicate the right branch.
For instance, s is reached by going le, then right, and nally
right. This is encoded as 011.
10.1.2
Human Codes
Since newline is an only child, we can place
newline one level higher at its parent.
This will save 1 bit to represent newline and the
new tree has cost of 173.
10.1.2
Prex Code
If the characters are placed only at the leaves, any
sequence of bits can always be decoded
unambiguously.
For instance, suppose
010011110001011000100011 is the encoded
string. 0 is not a character code, 01 is not a
character code, but 010 represets i.
It does not ma9er if the character codes are
dierent lengths, as long as no character code is a
prex of another character code.
10.1.2
Prex Code
OpVmal prex code:
10.1.2
Humans Algorithm
Assume the number of characters is C.
Humans Algorithm:
Maintain a forest of trees. The weight of a tree is
equal to the sum of the frequencies of its leaves.
10.1.2
Humans Algorithm
C-1 Vmes, select the two trees, T1 and T2, of
smallest weight, breaking Ves arbitrarily, and form
a new tree with subtrees T1 and T2.
Humans algorithm arer the rst merge:
10.1.2
Humans Algorithm
Humans algorithm arer the second merge:
10.1.2
Humans Algorithm
Humans algorithm arer the third merge:
10.1.2
Humans Algorithm
Humans algorithm arer the fourth merge:
10.1.2
Humans Algorithm
Humans algorithm arer the rh merge:
10.1.2
Humans Algorithm
Humans algorithm arer the nal merge:
10.1.2
Humans Algorithm
There are two details that must be considered.
Transmission of Code Book:
The encoding informaVon must be transmi9ed at te
start of the compressed le, since otherwise it will be
impossible to decode.
Two-pass Algorithm:
The rst pass collects the frequency data and the
second pass does the encoding.
10.1.2
Divide and Conquer

Divide-and-conquer-algorithms consist of two parts:
Divide: Smaller problems are solved recursively
Conquer: The soluVon to the original problem is then
formed from the soluVons to the subproblems.
Past Examples:
Chapter 2: Maximum Subsequence Sum Problem with O(N
log N) soluVon
Chapter 4: Linear-Vme tree traversal strategies (preorder
& postorder traversal)
Chapter 7: Mergesort and quicksort
10.2
In Closest-Points Problem, we are required to nd the
closest pair of points.
Below shows a small point set.
10.2.2
We can compute dL and dR recursively. Then how about dC?
P parVVoned into PL and PR; shortest distances are shown.
10.2.2
Let =min(dL, dR). We only need to compute dC if dC improves on .
Below shows a two-lane strip, containing all points considered for dC strip
10.2.2
For large point sets that are uniformly distributed,
the number of points that are expected to be in the
strip is very small.
In this case, we can use brute-force calculaVon of
min(, dC)
10.2.2
In the worst case, all the points could be in the strip.
We need a be9er algorithm using rened calculaVon
of min(, dC)
10.2.2
For p3, only p4 and p5 are considered in the second
for loop since they lie in the strip within verVcal
distance.
10.2.2
Dynamic Programming
A problem that can be mathemaVcally expressed
recursively can also be expressed as a recursive
algorithm.
But the implementaVon of a direct translaVon of
recursive formula may not give out an ecient
program results.
Dynamic programming rewrites the recursive
algorithm as a nonrecursive algorithm that
systemaVcally records the answers to the sub-
programs in a table, thus giving help to the compiler
to generate more ecient code.
10.3

Inecient algorithm to compute Fibonacci numbers
FN -> FN-1 and FN-2

FN-1 -> FN-2 and FN-3
FN-2 -> FN-3 and FN-4

10.3.1

Trace of the recursive calculaVon of Fibonacci numbers
10.3.1

Linear algorithm is more ecient if we removed the
recursive calls.
10.3.1

Another recursive example:
10.3.1

Trace of the recursive calculaVon in eval:
10.3.1

ImplementaVon with a table:
10.3.1
Backtracking Algorithms
A backtracking algorithm usually doesnt have
good performance but in many cases it has
signicant savings over a brute-force exhausVve
search.
For example, O(N2) is not good but it is
signicantly be9er than an O(N5) algorithm.
Backtracking Example: the Turnpike
ReconstrucVon Problem
10.5

The turnpike reconstrucVon problem is to
reconstruct a point set from the distances.
The given algorithm typically runs in O(N2log N)
but can take exponenVal Vme in the worst case.
As an example:
10.5.1

The turnpike reconstrucVon problem is to
reconstruct a point set from the distances.
The given algorithm typically runs in O(N2log N)
but can take exponenVal Vme in the worst case.
As an example:
10.5.1

IDI = 15, N=6.
Clearly, we can put x6 = 10 onto the Vmeline.
We remove 10 from D and the remaining
distances are as shown in below:
10.5.1

The largest remaining distance is 8, which means that
either x2 = 2 or x5 = 8.
By symmetry, both choices lead to soluVons (which
are mirror images of each other).
We can remove the distances x6-x5 = 2 and x5-x1 = 8
from D.
10.5.1

Since 7 is the largest value in D, either x4 = 7 or x2 = 3.
If x4 = 7, then x6 7 = 3 and x5 7 = 1 must also be present in D.
=> And this is TRUE.
If x2 = 3, then 3 x1 = 3 and x5 3 = 5 must also be present in D.
=> And this is TRUE.
So this step is not obvious. Trying that rst choice x4 = 7.
We can remove the distances x6 7 = 3 and x5 7 = 1 from D.
10.5.1

Now 6 is the largest value in D, either x3 = 6 or x2 = 4.
If x3 = 6 , then x4 x3 = 1. => And this is IMPOSSIBLE.
If x2 = 4, then x2 x0 = 4 and x5 x2 = 4 => And this is
IMPOSSIBLE.
So we have to backtrack and determine that x4 = 7 wont work.
Now trying x2 = 3 for 7.
10.5.1

Once again, for 6, we have to choose between x4 = 6 and x3 = 4.
x3 = 4 is IMPOSSIBLE because D only has one occurrence of 4
and this choice needs two 4s.
If x4 = 6, then we need 6, 2, 4, and 3. That will work.
10.5.1

The only remaining choice is to assign x3 = 5.
This leaves D empty and we have a soluVon.
10.5.1

Decision tree for the worked turnpike reconstrucVon example
10.5.1

Turnpike reconstrucVon algorithm: driver rouVne (pseudocode)
10.5.1

CS 146: Data Structures and Algorithms Slides

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

CS 146: Data Structures and Algorithms Slides

Загружено:

Авторское право:

Доступные форматы

1

Data Structures and Algorithms

Agenda for Lecture 1

About the Instructor

About Dr. Yeung

What do you know about CS146?

No cellphone in the class

Adding to this Class

However, I wont pass any add codes in the rst

Wri\en class assignments

Review course material

List Stacks and Queues

Purpose of this Course

Read the N numbers into an array

Read the rst K elements into an array

Ignore the element if smaller than the kth element

Which algorithm is be\er?

Word Puzzle Problem

Check each ordered triple (row, column, orienta+on)

Check each ordered quadruple (row, col, orienta+on,

Are both alogorithms prac+cal enough if the word list is a

Working with Large Data Set

Agenda for Lecture 2

Generic Programming (not using Generics)

Using Object for Genericity

Generic Programming (Using Generics)

Stay tuned with Canvas

New CS146 Class Added

Already enrolled in my CS146 class:

Know anyone who is s+ll trying to add CS146?

Wai+ng List for Sec+on 4 and 7:

Readings for Lecture #1 & #2

Readings for Lecture #2:

has a unique solu+on

Let P1,P2,,Pk be all the primes in order and consider N=P1P2P3Pk+1

3. Hence the original assump+on was erroneous.

How to implement it in Java?

Fundamental rules of recursion:

1. Must establish some base cases

Using Object for Genericity

Using Object for Genericity

Return an Object here

Java invokes the toString

Wrappers for Primi+ve Types

Wrappers for Primi+ve Types

Since Integer is immutable,

Using Interface Types

May not work if a class cant

Invoke the concrete

It is not required that the

Compa+bility of Array Types

But this may cause type confusion some+mes.

Compiles: Student IS-A

We have a type confusion because Student IS-NOT-A Employee

Generic Classes & Interfaces

Type checking will occur

More on Generic Types

A generic type has one or more type variables

Generic classes are converted by the compiler to nongeneric classes by a

Generic Classes & Interfaces

A generic method in an ordinary (non-generic) class

The Diamond Operator

Wildcards with Bounds

Now we can pass in