Вы находитесь на странице: 1из 564

1

Introduc+on

Data Structures and Algorithms


Angus Yeung, Ph.D.

Agenda for Lecture 1


Book-keeping

WEEK #1
LECTURE #1

About the Instructor


Assignments, Grading Policy, Add Code, etc.

Introduc+on
Why study Data Structures and Algorithms
Working with large data set

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

About Dr. Yeung


Senior SoMware Manager, Intel Corp.
Manage teams to develop Android and iOS mobile
soMware for Wearables Devices
PHD,MS,BS in Electrical & Computer Engineering
Berkeley MBA
Live with wife and 3 boys in Palo Alto

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

Assessment
Class Prole
Sophomore, Junior, Senior, Open University, etc.

What do you know about CS146?


What do you want to get out of this class?
What is your expecta+on on your class instructor?
Have you taken CS146 before?

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

Class Climate
Work really hard this is a challenging class!
A\end each class, because
Not everything I taught will be available on Canvas
I will go over some selected problem sets in the class
Those students skipped my last class all failed the class

No cellphone in the class


Dont sleep in the class
No web browsing in the class
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

Assignments
Assignment will include both wri\en and
programming problem sets
Do your own homework assignment as the
concept will be tested in quiz/exam.
All homework assignments will be submi\ed
electronically on Canvas
As such, no late or make-up assignments will be
graded.
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

Adding to this Class


The class is full but I may be able to add very few
students to this class.
I will check a couple of things before I pass out an add
code to a student:
Pre-requisites?
A\endance in the rst few lectures?
Senior or gradua+ng students?
Repea+ng CS146?

However, I wont pass any add codes in the rst


week.
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

Gegng Recommenda+on
You must earn an A- or be\er grade in this class if
you want to ask me for a recommenda+on le\er.
Dont add me to your Facebook; but I can consider
your invita+on to connect in LinkedIn aMer
knowing you for some +me in this class.
Okay to ask me ques+ons about hi-tech industry.

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

Textbook
We will use Data Structures and Algorithm
Analysis in JAVA by Mark Weiss in this class.

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

Slide Deck
PowerPoint Slides for each lecture will follow very
closely to the textbook.

Highlight

Comment

Textbook
Sec+on

Math

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

10

Grading Policy
The percentage weight assigned to class assignments,
group project and nal exam are listed as below:

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

11

Grading Policy
Grades will be assigned as described below. This scale may be adjusted
once the nal exam has been graded to provide a le\er grade
distribu+on that matches the expected average for this class.

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

12

Course Schedule

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

13

Course Schedule

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

14

Course Structure
Founda'on

Reinforcement

A\end lectures
Read book chapters

Wri\en class assignments


Java Programming assignments
Quiz aMer each assignment

Integra'on

Review course material


Study for exams

Introduc+on to CS146
Algorithm Alnalysis

Assignment 1

List Stacks and Queues

Assignment 2

Trees
Hashing

Assignment 3

Priority Queues

Mid-Term

Sor+ng
The Disjoint Set Class

Assignment 4

Graph Algorithms
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

Final Exam
15

Purpose of this Course


This course addresses two important aspects of
computer science:
Data Structure
Methods of organizing large amounts of data
Algorithm Analysis
Es+ma+on of the running +me of algorithms

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

16

1.0

Selec+on Problem
Determine the kth largest of a group of N numbers
Algorithm 1:

Read the N numbers into an array


Sort the array in decreasing order by some algorithm
Return the element into posi+on K

Algorithm 2:

Read the rst K elements into an array


Sort them in decreasing order
Each remaining element is read one by one

Ignore the element if smaller than the kth element


Otherwise the element is placed in the array, bumping one element out

Which algorithm is be\er?


Is either algorithm good enough?
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

17

1.1

Word Puzzle Problem


With a 2d array of le\ers and a list of words, nd the words
in the puzzle
Algorithm 1:

Check each ordered triple (row, column, orienta+on)


Use lots of nested for loops to nd matches

Algorithm 2:

Check each ordered quadruple (row, col, orienta+on,


character#) that doesnt run o an end of the puzzle
Use lots of nested for loops to nd matches

Are both alogorithms prac+cal enough if the word list is a


dic+onary?
Is it possible, even with a large word list, to solve the
problem in a ma\er of seconds?
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

18

1.1

Working with Large Data Set


Wri+ng a working program is not good enough if
the program is to be run on a large data set
Running +me becomes an issue:
How to es+mate the running +me of a program for
large inputs?
How to compare the running +mes of two programs
without actually coding them?
How can we determine program bo\lenecks?
How do we use op+miza+on techniques to improve
the speed of a program?
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

19

1.1

Agenda for Lecture 2

Book-Keeping
Mathma+cs Review

Exponents, Logarithms
Series, Harmonic Number, Euler's Constant
Modular Arithme+c
Proof by Induc+on
Recursive Func+on

Generic Programming (not using Generics)

WEEK #1
LECTURE #2

Using Object for Genericity


Wrappers for Primi+ve Types
Using Interface Types for Genericity
Compa+ility of Array Types

Generic Programming (Using Generics)


Simple Generic Classes and Interfaces
Autoboxing / Unboxing
The Diamond Operator, Wildcards with Bounds
Generic Sta+c Methods, Type Bounds

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

20

Stay tuned with Canvas


Announcements
New CS146 classes
Reading List

File Uploaded:
Slide deck for Chapter 1
Green Sheet

Upcoming:
Assignment 1 will be posted
Revised slide deck for Chapter 1
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

21

New CS146 Class Added


New CS146 Classes Added:
A new CS146 sec+on taught by Girish on TT 7:30-8:45 AM for Spring 2015.
Another CS146 class TT noon-1:55 PM taught by Evan in this summer.

Already enrolled in my CS146 class:


Do nothing
Consider other sec+ons if working out be\er for you

Know anyone who is s+ll trying to add CS146?


Let the person know about this new CS146 sec+on.

Wai+ng List for Sec+on 4 and 7:


34 students on my wai+ng list for Sec+on 4 and 7.
14 students are gradua+ng seniors or seniors

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.
22

Readings for Lecture #1 & #2


Readings for Lecture #1:
Sec+on 1.1 What's the Book About

Readings for Lecture #2:


Sec+on 1.2 Mathema+cs Review
Sec+on 1.3 A Brief Introduc+on to Recursion
Sec+on 1.4 Implemen+ng Generic Components Pre-Java 5
Sec+on 1.5 Implemen+ng Generic Components Using Java 5
Generics (Exclude 1.5.7 and 1.5.8)

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

23

Mathema+cs Review
Exponents

Logarithms

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

24

1.2

Mathema+cs Review
Series
Geometric Series:

Harmonic Number:

Eulers Constant:

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

Arithme+c Series:

H1
H2
H3
H4
H5
H6
H7
H8
H9

25

ln
ln
ln
ln
ln
ln
ln
ln
ln

1
2
3
4
5
6
7
8
9

=
=
=
=
=
=
=
=
=

1,
0.806852...,
0.734721...,
0.697038...,
0.673895...,
0.658240...,
0.646946...,
0.638415...,
0.631743...,

1.2

Mathema+cs Review
Modular Arithme'c
Example:

If N is a prime number:
is true if and only if

or

has a unique solu+on


has either two solu+ons
or no solu+ons
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

26

1.2

Mathema+cs Review
Proof by Induc'on
Prove that: (Fibonacci numbers)
F0 = 1, F1 = 1, F2 = 2, F3 = 3, F4 =5,...,Fi = Fi1+Fi2,
sa+sfy Fi <(5/3)i,for i 1.
Base Case: F1 = 1 < 5/3 and F2 = 2 < 25/9
Induc+ve Hypothesis: We assume that the theorem is true




for i = 1, 2, . . . , k;
Proof: By deni+on: Fk+1 = Fk + Fk1
Use the induc+ve hypothesis on the right-hand side:




Fk+1 < (5/3)k + (5/3)k1
<
<
<
<
<

(3/5)(5/3)k+1 + (3/5)2(5/3)k+1
(3/5)(5/3)k+1 + (9/25)(5/3)k+1
(3/5 + 9/25)(5/3)k+1
(24/25)(5/3)k+1
(5/3)k+1

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

27

1.2

Mathema+cs Review
Proof by Induc'on
Prove that:
Base Case: the theorem is true when N = 1
Induc+ve Hypothesis: We assume that the theorem is true for 1 k N;
Proof:

We have:
Use the induc+ve hypothesis on the right-hand side:

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

28

1.2

Mathema+cs Review
Proof by Counterexample
Prove that the statement Fk k2 is false
The easiest way to prove this is to compute F11 = 144 > 112.

Proof by Contradic'on
Prove that there is an innite number of primes.
1. Assume that the theorem is false.
So there is some large prime Pk.

2. Show that this assump+on implies that some known property is false.

Let P1,P2,,Pk be all the primes in order and consider N=P1P2P3Pk+1


Clearly, N is larger than Pk, so by assump+on N is not prime.
However, none of P1,P2,,Pk divides N exactly, because there will be a remainder of 1.
This is a contrac+on: because every number is either prime or a product of primes.

3. Hence the original assump+on was erroneous.


This implies that the theorem is true.

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

29

1.2

Recursive Func+on
Consider the following func+on:
f(0) = 0 and
f(x) = 2f(x-1)+x2

How to implement it in Java?


Java allows func+ons to be recursive
What a bad idea! For illustra+ve purpose only

Fundamental rules of recursion:

Base Case

1. Must establish some base cases


Consider that to set up exit condi+ons
2. Must make progress toward a base case
For example, f(-1) will not converge
3. Must work with all the recursive calls
So we dont need to mind the details of the book-
keeping arrangements.
4. Must never duplicate work by working the same instance
of a problem in separate recursive calls
So-called compound interest rule
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

30

Recursive Call

1.3

Generic Implementa+on
Generic mechanism promotes code reuse an
important aspect of object-oriented programming
Java didnt support generic implementa+on
directly un+l Java 5
Pre-Java 5: generic methods and classes can be
implemented in Java using the principle of
inheritance
Use Object for Genericity
Use Interface Types for Genericity
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

31

1.4

Using Object for Genericity


We can use an appropriate superclass, i.e., Java
Object class as the generic class
All class objects are inherited from Java Object
class explicitly or implicitly.
To access a specic method of the object, we must
This excludes Primi+ve Types:
downcast to the correct type
Using Object as a generic type
works only if the opera+ons that
are being performed can be
expressed using only methods
available in the Object class.
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

32

1.4

Using Object for Genericity


Some useful Object methods:

String toString()
boolean equals(Object Obj)
Object clone()
int hashCode()

Return an Object here

Java invokes the toString


method here.
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

33

1.4

Wrappers for Primi+ve Types


Java provides a wrapper class for each of the eight
primi+ve types (not compa+ble with Object)
For example, the wrapper class for the int type
is Integer
Each wrapper object is immutable (meaning its
state can never change aMer the object is
constructed)
Like other Java classes, the Object class is the
super class for each wrapper class
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

34

1.4

Wrappers for Primi+ve Types

Since Integer is immutable,


value (Primi+ve Type) is
assigned via constructor.

intValue() is a
method of
Integer

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

35

1.4

Using Interface Types


5

May not work if a class cant


implement needed interface
(e.g., library class, nal class)

Primi+ves cannot be 3
passed as Comparables
but the wrappers can
Covariant Data Type : be
careful with it!
Must implement the
Comparable interface

Objects must be
compa+ble, e.g. same
super class Shape

Invoke the concrete


compareTo method

It is not required that the


interface be a standard
library interface.

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

36

1.4

Compa+bility of Array Types


Java Arrays are type-compa+ble, known as a covariant array type.
For example: Covariance of arrays is needed so Line 29 & 30 can compile.

But this may cause type confusion some+mes.


Assume that Employee IS-A Person and Student IS-A
Person
Compiles: arrays are
compa+ble.

Compiles: Student IS-A


Person.

We have a type confusion because Student IS-NOT-A Employee


and No ClassCastException is thrown
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

37

1.4

Generic Classes & Interfaces


The class or interface declara+ons include type parameters
enclosed in angle brackets <> aMer the class or interface name.
Type Parameter

Generic Types
available in Java 5
and higher

Type checking will occur


during compile-+me rather
than run+me
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

38

1.51

More on Generic Types

A generic type has one or more type variables


Type variables are instan'ated with class or interface types
Cannot use primi+ve types, e.g. no ArrayList<int>
When dening generic classes, use type variables in deni+on:
public class ArrayList<E>
{
public E get(int i) { . . . }
public E set(int i, E newValue) { . . . }
. . .
private E[] elementData;
}

Generic classes are converted by the compiler to nongeneric classes by a


process known as type erasure.
The benet is that the programmer does not have to place casts in the
code, and the compiler will do signicant type checking.
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

39

Generic Classes & Interfaces


Type parameters are usually single capital le\ers.

For example,
public interface Map<K, V> {}
denes a generic Map interface with two type
parameters, K and V.
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

40

Generic Methods
Generic method = method with type parameter(s)
public class Utils
{
public static <E> void fill(ArrayList<E> a, E value, int count)
{
for (int i = 0; i < count; i++)
a.add(value);
}
}

A generic method in an ordinary (non-generic) class


Type parameters are inferred in call
ArrayList<String> ids = new ArrayList<String>();
Utils.fill(ids, "default", 10); // calls Utils.<String>fill

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

41

Autoboxing / Unboxing
Java 5 adds autoboxing and unboxing features.
Autoboxing: If an int is passed in a place where an
Integer is required, the compiler will insert a call to
the Integer constructor behind the scenes.
Auto-Unboxing: If an Integer is passed in a place
where an int is required, the compiler will insert a call
to the intValue method behind the scene.

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

42

1.5.2

The Diamond Operator


The diamond operator simplies the code when
type parameter is known
Since the type parameter is
known, we can use diamond
operator throughout.

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

43

1.5.3

Wildcards with Bounds


Wildcards are used to express subclasses (or
superclasses) of parameter types.

Now we can pass in


Shapes subclasses such
as Circle and Square.

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

44

1.5.4

Generic Sta+c Methods


Generic sta+c method is a generic method with
type parameter declared explictly.
Generic sta+c method forces type checking in
compile +me instead of run +me.

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

45

Remember: this is a
generic method
(instead of generic
class or interface)

1.5.5

Type Bounds
The type bound species proper+es that the
parameter types must have.
.

Type Bounds: AnyType IS-A Comparable<T>


where T is a superclass of AnyType

Compiler cannot prove that the


call to compareTo is valid.

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

46

1.5.6

2. Algorithm Analysis

Data Structures and Algorithms


Angus Yeung, Ph.D.

Agenda for Lecture 3


Book-keeping

WEEK #2
LECTURE #3

Add Codes
Assignment #1

Generic ImplementaFon

The Diamond Operator


Wildcards with Bounds
Generic StaFc Methods
Type Bounds

Target for this lecture: Finish the


basis of algorithmic analysis

Algorithmic Analysis

RelaFve Rates of Growth


Big-Oh Analysis
Upper and Lower Bounds
Typical Growth Rates
Common Order-of-Growth ClassicaFons

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

Course Structure
Founda'on

Reinforcement

AXend lectures
Read book chapters

WriXen class assignments


Java Programming assignments

Integra'on

Review course material


Study for exams

IntroducFon to CS146
Algorithm Alnalysis

Assignment 1

List Stacks and Queues

Assignment 2

Trees
Hashing

Assignment 3

Priority Queues

Mid-Term

SorFng
The Disjoint Set Class

Assignment 4

Graph Algorithms
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

Final Exam
3

Purpose of this Chapter


An algorithm is a specied set of simple instrucFons to be followed to
solve a problem correctly.
Once an algorithm resolves a problem, we shall determine how much
resources, such as Fme and space, the algorithm will require.
As such, we want to discuss the following aspects in this chapter:

Time EsFmaFon
How to esFmate the Fme required for a program
ReducFon of Running Time
How to reduce the running Fme of a problem, for example, from days or
years to fracFons of a second.
Recursion
The results of careless use of recursion
Ecient Algorithms
Very ecient algorithms to raise a number to a power and to compute the
greatest common divisor of two numbers.

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

2.0

MathemaFcal Background
Throughout this course we will use the following four
deniFons to establish a relaFve order among funFons:

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

2.1

RelaFve Rates of Growth


Given two funcFons, there are usually points
where one funcFon is smaller than the other
funcFon, it doesnt make sense to claim, for
instance, f(N) < g(N)
We compare two funcFons using their relaFve
rates of growth
Example: Compare f(N): 1,000N with g(N): N2
If N < 1,000, f(N) > g(N)
If N > 1,000, f(N) < g(N)
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

2.1

DeniFon 2.1 and Big-Oh

DeniFon 2.1 Explained:

There is some point n0 past which cf(N) is always at least as large as T(N)
If the constant factors are ignored, f(N) is at least as big as T(N).

Lets illustrate this deniFon with our example:

Case 1: T(N)=1,000N, f(N)=N2, n0=1,000, and c=1.


Case 2: T(N)=1,000N, f(N)=N2, n0=10, and c=100.

Bad Style: f(N)O(g(N))


Wrong: f(N)O(g(N))

We use Big-Oh notaFon in our expression:

1,000N = O(N2) (order N-squared or Big-Oh N-squared)

In summary, DeniFon 2.1 says that:



Growth Rate of T(N) Growth Rate of f(N).
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

2.1

DeniFon 2.2 2.4


DeniFon 2.2 Explained: Growth Rate of T(N) Growth Rate of g(N)

DeniFon 2.3 Explained: Growth Rate of T(N) = Growth Rate of h(N)


DeniFon 2.4 Explained: Growth Rate of T(N) < Growth Rate of h(N)

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

2.1

Upper and Lower Bounds


T(N) = O(f(N)) means f(N) is an upper bound on T(N)
Example: N2 = O(N3)
Example: N2 = O(2N2)

T(N) = (f(N)) means T(N) is a lower bound on f(N)


Example: N3 = (N2)
Example: N2 = (2N2)

Hint: we can ignore the constant


in esFmaFng growth rates.

We want to make the result as Fght as possible


For example, if g(N) = 2N2, then g(N)=O(N4), g(N)=O(N3),
and g(N)=O(N2) are all correct.
g(N)=O(N2) gives the best result because it is the Fghest
possible descripFon of upper bound.
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

2.1

Upper and Lower Bounds


Upper and lower bounds valid for n > n0 smooth
out the behavior of complex funcFons

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

10

Upper and Lower Bounds


c*g(n) is an upper
bound for f(n)

O NotaFon

c*g(n) is a lower
bound for f(n)

c1*g(n) is an upper
bound for f(n) and
c2*g(n) is a lower
bound for f(n)

NotaFon

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

NotaFon
11

Big-Theta, Big Oh and Big Omega


Nota'on

Provides

Big Theta

AsymptoFc order (N2)


of growth

N2
10 N2
5 N2 + 22 N log N + 3N

Classify
algorithms

Big Oh

(N2) and smaller O(N2)

10 N2
100 N
22 N log N + 3 N

Develop Upper
Bounds

N2
N5
N3 + 22 N log N + 3 N

Develop Lower
Bounds

Big Omega (N2) and larger

Example Shorthand for

(N2)

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

Used to

12

Typical Growth Rates

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

13

2.1

Order-of-Growth ClassicaFons
Common order-of-growth classicaFons:
1, log N, N, N log N, N2, N3, and 2N

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

14

Order-of-growth ClassicaFons
Order of
Growth

Name

Typical Code

Descrip'on

Example

constant

a = b + c;

Statement

Add two numbers

log N

logarithmic

while (N > 1)
{ N = N / 2; ... }

Divide in half

Binary Search

linear

for (int i = 0; i < N; i++)


{... }

loop

Find the maximum

N log N

linearithmic

See mergesort

Divide and
conquer

Mergesort

N2

quadraFc

for (int i = 0; i < N; i++)


for (int j = 0; j < N; j++)
{... }

Double loop

Check all pairs

N3

cubic

for (int i = 0; i < N; i++)


for (int j = 0; j < N; j++)
for (int k = 0; k < N; k++)
{... }

Triple loop

Check all triples

2N

exponenFal

See combinatorial search

ExhausFve Search

Check all subsets

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

15

Merge Sort - Explained

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

16

Merge Sort - Demo

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

17

Useful Rules
Rule 1
If T1(N) = O(f(N)) and T2(N) = O(g(N)) , then
T1(N) + T2(N) = O( f(N) + g(N) )
T1(N) * T2(N) = O( f(N) * g(N) )

Rule 2

If T1(N) is a polynomial of degree , then


T1(N) = ( Nk )

Rule 3

If logkN = O(N) for any constant k, then


We know logarithms grow very slowly.

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

18

2.1

Agenda for Lecture 4


Review of Lecture #3

WEEK #2
LECTURE #4

Big-Oh Revisit

Book-keeping
Assignment #1

Algorithmic Analysis (Chapter 2)


Hpitals rule
Memory Usage
Sample Problem: Maximum Subsequence Sum Problem

Algorithm #1: O(N3)


Algorithm #2: O(N2)
Algorithm #3: O(NlogN)
Algorithm #4: O(N)

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

19

Some Tips for Big-Oh


Tip 1: It is very bad style to include constants or low-order terms inside a
Big-Oh.
Dont write this:

T(N) = O(2N2)
T(N) = O(N2+N)

Write this:

T(N) = O(N2)

Tip 2: For comparing the relaFve growth rates for two funcFons f(N) and
g(N), use Hpitals rule if necessary (most of Fme, this method is an
overkill)

g(N) grows faster

For example: limN->f(N)/g(N)


The limit is 0: This means that f(N)=o(g(N))
The limit is c0: This means that f(N)=(g(N))
The limit is : This means that g(N)=o(f(N))
The limit does not exist: There is no relaFon

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

f(N) and g(N) grows


at the same rate
f(N) grows faster
20

2.1

Downloading a File from Internet


Problem for downloading a le from Internet:
IniFal Delay: 3 seconds
Download Speed: 1.5 MB/s
File Size: N MB

Downloading Fme: T(N) = N/1.5 + 3


Analysis: T(1,500) 2 T(700) (a linear funcFon)
Big-Oh: T(N) = O(N)
Even though T(N) = (N) would be more precious,
Big-Oh answers are more typical.

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

21

2.1

How about Memory?

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

22

Memory Usage

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

23

One More Example

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

24

What to Analyze
Running Fme is the most important resource to analyze.
Several factors could aect the running Fme:
Computer
Compiler
Programming Language

Need to consider the implementaFon ineciency of a programming


language

Algorithm

Use the average-case performance, not the best-case performance

Input to the Algorithm

Use Tworst(N), not Tavg(N)

Unless otherwise specied, we use the worst-case Fme as


the quanFty for a factor
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

25

2.3

Max Subsequence Sum Problem


For example:

For input 2, 11, 4, 13, 5, 2, the answer is 20 (A2 through A4)

There are many algorithms to solve this problem.


Algorithm 4 is clearly the best choice for large amounts of input.

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

26

2.3

Max Subsequence Sum Problem


The gure shows the growth rates of the running
Fmes of the four algorithms

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

27

2.3

Max Subsequence Sum Problem


For larger values of N, the performance merit of
each algorithm becomes more evident.

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

28

2.3

A Simple Running Time Example


The cost for this simple program is 6N + 4 units.
We say that this method is O(N).
Count: 0
Count: 0
Count: 1

Init: 1, tesFng: N+1,


incremenFng: N+1

Count for * * + =: 4N
Count: 0

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

29

2.4

General Rules
Rule 1 for loops
The running Fme of the statements inside the for
loop the number of iteraFon = Total
O(N)
Example: 4 N = 4N

Rule 2 Nested Loops


The running Fme of the statement the product of
the sizes of all the loops = Total
O(N2)
Example:
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

30

2.4

General Rules
Rule 3 ConsecuFve Statements
Just add (the maximum is the one that counts)
Example:
O(N) lower order can be ignored
O(N2) higher order counts

Rule 4 if/else

Running Fme of the test plus the larger of the running


Fmes of statement inside of if or else loop
Example:
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

31

2.4

Algorithm 1
Cubic maximum conFguous subsequence sum
algorithm
3

2
1
O(N2) lower order can be ignored

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

32

2.4

Algorithm 2
EliminaFng Line 13 and 14 in Algorithm 1, we can
reduce the running Fme to O(N2).

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

SimplicaFon

33

2.4

Algorithm 3
We can use conquer and divide strategy and further simplify the
soluFon to O(NlogN).
The idea is to split the problem into two roughly equal subproblems and
solve them recursively.
Divide

Conquer
11

The maximum subsequence sum can


either occur enFrely in the le| half of
the input, or enFrely in the right half,
or cross the middle in both halves.
We use recursive calls to nd the
maximum subsequence sum in the
le| half and right half of the input.
The sums of both halves can be
added together to determine if the
maximum subsequence sum crosses
the middle.

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

34

2.4

Algorithm 3
We can use conquer and divide strategy and simplify the soluFon to O(NlogN).


Stop condiFon
Special case for odd
number of input entries

Recursive
calls

StarFng point: Calling funcFon for Algorithm 3.


CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

35

2.4

Algorithm 3

Let T(N) as the Fme to solve a Max Subsequence sum problem of size N and T(1) as one unit.
T(1)=1, T(N) = 2T(N/2) + O(N)
ObservaFon: T(2)=2*2, T(4)=4*3, T(8)=8*4, T(16)=16*5
Conclusion: If N = 2k, then T(N)=N*(K+1)=NlogN+N=O(NlogN)

T(1) = 1 unit
T(N/2)

T(N/2)

O(N)

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

36

2.4

Algorithm 4
If we dont need to know the actual best subsequence, the
design of algorithm can be further simplied to O(N).

One loop only!


One pass through of the data

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

37

2.4

3. List, Stacks and Queues

Dr. Angus Yeung

CS146 Data Structures and Algorithms, Spring 2015,


Angus Yeung, Ph.D.

Course Structure
Founda'on

Reinforcement

AEend lectures
Read book chapters

WriEen class assignments


Java Programming assignments

Integra'on

Review course material


Study for exams

IntroducPon to CS146
Algorithm Alnalysis

Assignment 1

List Stacks and Queues

Assignment 2

Trees
Hashing

Assignment 3

Priority Queues

Mid-Term

SorPng
The Disjoint Set Class

Assignment 4

Graph Algorithms
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

Final Exam

Agenda for Lecture 6


Lists, Stacks and Queues (Chapter 3)
Abstract Data Types (ADTs)
The List ADT
Simple Array ImplementaPon of Lists
Simple Linked Lists

Lists in the Java CollecPons API

Collection Interface
Iterators
The List Interface, ArrayList, and LinkedList
ListIterators

ImplementaPon of ArrayList
The Basic Class
The Iterator and Java Nested and Inner Classes
Java Review:

StaPc Class

Inner Class
Local Inner Class

Anonymous Inner Class

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

WEEK #3
LECTURE #6

Chapter 3 Overview
Chapter 3 discusses some most simple and basic
data structures
Introduce the concept of Abstract Data Type (ADTs)
Show how to eciently perform operaPons on Lists
Introduce the Stack ADT and its use in implemenPng
recursion
Introduce the Queue ADT and its use in operaPng
systems and algorithm design
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

3.0

Abstract Data Types


An abstract data type (ADT) is a set of objects
together with a set of operaPons.
MathemaPcal abstracPon
Objects:
lists, sets, graphs, etc.

OperaPons:
add, remove, contain, union, nd, etc.

ImplementaPon:
May have mulPple implementaPon
hidden away from users

AbstracPon: where interface is


separated from implementaPon
in order to hide the details of the
implementaPon

Similar to primiPve types in Java:int, double,


float, etc.
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

3.1

List ADT
The list ADT views its data much like an array
does: elements are accessible via consecuPve
indices.

Lists are dynamic:


can grow or shrink
length is not xed

There are never gaps between items in a list


CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

The List ADT

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

The List ADT


List provides a exible interface that allows inserPng
and removing elements anywhere in the list.
When a list is implemented by using an array:
Pros:
printList is done in linear Pme.
findkth operaPon takes constant Pme

Cons:
Trade-o for this exibility: operaPons is O(N), instead of O(1) in
other data structures.
Worst case: InserPng into posiPon 0 requires shiding all the
elements in the list up one spot.
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

List: Shiding
5, 8, 2, 1, 4, 7
add(3,6)
5, 8, 2, 6, 1, 4, 7
RemoveAt(1) return 8
5, 2, 6, 1, 4, 7
Elements changed posiPons

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

Elements changed posiPons

List may not be


implemented with an
array, so no actual
shiding may occur.

Simple Array ImplementaPon


Although arrays are created with a xed capacity,
we can create a dierent array with double the
capacity when needed.

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

3.2.1

Simple Linked List


Elements are not stored conPguously
Avoid cost of inserPon and delePon.

Consists of a series of nodes, which are not


necessarily adjacent in memory
printList and findkth operaPons are no longer
as ecient as an array implementaPon

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

3.2.2

Linked List: InserPon & DelePon


Elements are not stored conPguously
Avoid cost of inserPon and delePon.

DelePon

InserPon

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

3.2.2

Removing the Last Node


Not so easy to remove the last node:
Search for the node with next link to the last node
Change the next link to null
Update the link to the last node

A doubly linked node


CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

3.2.2

Java CollecPons API


Many data structures, e.g., List ADT, are
implemented in Javas CollecPons API
in package java.util.

Sample methods in Java Collection Interface:


size: returns the number of items in the collecPon
isEmpty: returns ture if and only if the size is
zero.
contains: returns true if x is in the collecPon
add/remove: add/remove item x to/from the
collecPon
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

3.3.1

Subset of CollecPon Interface


The Collection interface extends the Iterable interface .
Classes that implement the Iterable interface can provide a
way to view all their items.

Capitalized I

This i is not capitalized.

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

3.3.2

Iterators
CollecPons that implement the Iterable interface must provide a method
named iterator
The method iterator returns an object type Iterator.
The Iterator is an interface dened in package java.uPl and is shown below:

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

3.3.2

Print All the Items


Each call to next gives the next item in the collecPon and hasNext can be
used to tell if there is a next item.
When the compiler sees an enhanced for loop being used on an object that is
Iterable, it mechanically obtains an Iterator and then calls to next and
hasNext.

This is an enhanced for loop.


CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

3.3.2

remove using Iterator


Both the Collection and Iterator interfaces contain a
method called remove.
Collections remove method must rst nd the item to
remove.
Iterators remove method removes the last item returned
by next.
1
This is more ecient in some cases, e.g., removing every other item
in the collecPon

Be careful with a ConcurrentModificationException


when a structural change (e.g., call using Collections
remove) to the collecPon being iterated.
Thats another reason to prefer the iterators remove method, so
there is no a concurrency problem.
2
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

3.3.2

The List Interface


The List interface extends Collection, so it contains
All the methods in the Collection interface, plus
A few others (shown in below)
Access or change item
at index; Index goes
from 0 to size()-1
Overloaded to remove an
item at a specied posiPon

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

3.3.3

ArrayList and LinkedList


The ArrayList provides a growable array
implementaPon:
Advantage: Calls to get and set take constant Pme
Disadvantage: If the changes are not made at the end, Inser'on
and removal are expensive

The LinkedList provides a doubly linked list


implementaPon of the List ADT.
Advantage: If the posiPon of the changes is known, inser'on of
new items and removal of exisPng items is cheap
Disadvantage: Since LinkedList is not easily indexable, calls
to get are expensive (unless they are close to the end)
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

3.3.3

Making a List
Whether an ArrayList or LinkedList is passed as a
parameter, the running Pme of the following method is O(N)

because each call to add (to the end of the list), takes constant Pme.

If we are adding to the front, then


ArrayList takes O(N2)
LinkedList is O(N) operaPon

Here we are adding a new item


to the front of the list
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

3.3.3

Sum of the Numbers in a List


Example: compute the sum of the numbers in a List
Running Pme for an ArrayList: O(N)
Running Pme for a LinkedList: O(N2) because calls to
get are O(N) operaPons.
Running Pme using an enhanced for loop and Lists
iterator: O(N)

Similarly, we want to use


contains and remove in
Collection because
ArrayList and LinkedList
are inecient for searches.

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

3.3.3

Agenda for Lecture 7


Lists, Stacks and Queues (Chapter 3)
Lists in the Java CollecPons API (cont)
ListIterators

ImplementaPon of ArrayList
The Basic Class
The Iterator and Java Nested and Inner Classes
Java Review:

StaPc Class
Inner Class
Local Inner Class
Anonymous Inner Class

ImplementaPon of LinkedList

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

WEEK #4
LECTURE #7

Example: remove Even Numbers


Example: remove all even-valued items in a List
Before: 6, 5, 1, 4, 2

Ader: 5, 1

Using an ArrayList is a losing strategy!

The remove is not ecient, so the rouPne takes quadraPc Pme.

Using a LinkedList has problems as well.

The call to get is not ecient, taking quadraPc Pme.


The call to remove is equally inecient, because it is expensive to get to posiPon i.

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

3.3.4

Improvement: remove Even Numbers


Improvement 1: Use iterator instead of get.
Problem: ConcurrentModificationException with
Collections remove

Improvement 2: Use iterators remove to avoid the


ConcurrentModificationException problem.

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

Before:
ArrayList -> O(N2)
LinkedList -> O(N2)
Ader:
ArrayList -> O(N2)
because array items must be shi3ed
LinkedList -> O(N)

3.3.4

Running Times for Improvement #2


Running Pmes for our code in Improvement #2
LinkedList: linear growth rate O(N)
ArrayList: quadraPc growth rate O(N2)

List Type

No. of Items

Running Time (s)

LinkedList<Integer>

800,000

0.039

LinkedList<Integer>

1,600,000

0.073

ArrayList<Integer>

800,000

300

ArrayList<Integer>

1,600,000

1,200

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

3.3.4

ListIterators
A ListIterator extends the funcPonality of an Iterator for Lists:

Previous and hasPrevious allow traversal of the list from the back to the
front
Add places a new item into the list in the current posiPon

Normal starPng point: next


returns 5, previous is illegal,
add places item before 5

next returns 8, previous


returns 5, add places item
between 5 and 8

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

next is illegal, previous


returns 9, add places item
ader 9

3.3.5

ImplementaPon of ArrayList
Outlines of a usable ArrayList generic class:
MyArrayList

Maintain the underlying array, the array capacity, and the


current number of items stored
Provide a mechanism to change the capacity of the
underlying array
Provide an implementaPon of get and set
Provide basic rounPnes, such as size, isEmpty, and
clear, as well as remove, and two versions of add
rouPnes
Provide a class that implements the Iterator interface.
E.g., next, hasNext, and remove.

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

3.4

MyArrayList: The Basic Class

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

3.4.1

MyArrayList: The Basic Class

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

3.4.1

Inner Class - 1
This iterator version doesnt work because theItems and
size() are not part of the ArrayListIterator class.

Problem with scoping

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

3.4.2

Inner Class - 2
The iterator is a top-level class and stores the current posiPon and a link to the
MyArrayList. It doesnt work because theItems is private in the
MyArrayList class
It is dened as a private.

It is a HAS-A relaPonship.

Error Here!
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

3.4.2

Inner Class - 3
This Pme it works: The iterator is a nested class and stores the current
posiPon and a link to the MyArrayList. It works because the nested class is
considered part of the MyArrayList class.
ArrayListIterator is dened
inside of MyArrayList.

Static indicates
a nested class

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

3.4.2

Inner Class - 4
This one works as well: The iterator is an inner class and stores
the current posiPon and an implicit link to the MyArrayList.
Inner class doesnt have
the static keyword

We are using the implicit link to


the MyArrayList here.

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

3.4.2

Nested Class in Java


StaPc Class: declared as a staPc member of
another class
Inner Class: declared as an instance member of
another class
Local Inner Class: declared inside an instance
method of another class
Anoymous Inner Class: like a local inner class, but
wriEen as an expression which returns a one-o
object
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

StaPc Classes
The nested class has access to its containing
classs private staPc members (is it useful at all?)
package pizza;

public class Rhino {


...

public static class Goat {


...
}

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

Inner Classes
An inner class is a class declared as a non-staPc member of another class
The inner class instance has access to the instance members of the
containing class instance.
These enclosing instance members are referred to inside the inner class via
just their simple names, not via this (this in the inner class refers to the
inner class instance, not the associated containing class instance).

package pizza;

public class Rhino {


public class Goat {
...
}

private void jerry() {


Goat g = new Goat();
}

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

Local Inner Classes


A local inner class is a class declared in the body of a
method.
Such a class is only known within its containing
method, so it can only be instanPated and have its
members accessed within its containing method.
Because a local inner class is neither the member of a
class or package, it is not declared with an access
level.
Access to the containing classs instance members is
like in an instance inner class.
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

Anonymous Inner Classes


A local inner class is instanPated at most just once
each Pme its containing method is run.
Use like this:
new *ParentClassName*(*constructorArgs*) {*members*}

Cannot supply your own constructor


Setup using an iniPalizer block: a {} block placed
outside any method

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

ImplementaPon of LinkedList
MyLinkedList : contains links to both ends, the size of
the list, and a host of methods.
Node : contains data and links to the previous and next
nodes, along with appropriate constructors.
LinkedListIterator : implements Iterator with
next, hasNext and remove methods.

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

3.5

Adding a Node

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

3.5

Removing a Node

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

3.5

Agenda for Lecture 8


Book Keeping
SoluPons for Assignment #1
Assignment #2

Lists, Stacks and Queues (Chapter 3)


The Stack ADT
Stack Model
ImplementaPon of Stacks
ApplicaPons

The Queue ADT


Queue Model
Array ImplementaPon of Queues

Short Review QuesPons

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

WEEK #4
LECTURE #8

The Stack ADT


A stack is a list with the restricPon that inserPons
and delePons can be performed at the end of the
list, called the top.
Stack model:
Only the top element is accessible.

Push and pop operaPons are based on


LIFO (last in, rst out)

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

3.6.1

ImplementaPon of Stacks
Linked List ImplementaPon of Stacks
Push: insert at the front of the list
Top/Pop: return the value of the element at the front
of the list and delete it.

Array ImplementaPon of Stacks


Push: increment topOfStack and set the element at
topOfStack.
Top/Pop: return the value of the element at
topOfStack and decrement topOfStack.
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

3.6.2

ApplicaPon: Balancing Symbols


Compilers need to check if all symbols are balanced, e.g.,
[()] is legal, but [(] is wrong.
One can use stack to balance symbols:

Make an empty stack.


Read characters unPl end of le.
If the character is an opening symbol, push it onto the stack.
If it is a closing symbol, then if the stack is empty report an
error;
Otherwise, pop the stack.
If the symbol popped is not the corresponding opening symbol,
then report an error.
At end of le, if the stack is not empty report an error.

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

3.6.3

ApplicaPon: Poswix Expressions


Poswix or reverse Polish notaPon express an expression:
4.99 * 1.06 + 5.99 + 6.99 * 1.06 =, into
4.99 1.06 * 5.99 + 6.99 1.06 * +

The easiest way to do poswix is to use a stack.


For example, 6 5 2 3 + 8 * + 3 + *

A + is read, so 3 and 2
are poped from the stack.

The sum 5 is pushed.

Next 8 is pushed

With a *, 8 and 5 are


poped and 40 is pushed.

Next a + is seen, so 40
and 5 are popped and 5 +
40 = 45 is pushed.

Now 3 is pushed.

With a +, 3 and 45 are


popped and 48 is
pushed.

Finally, a * is seen and


48 and 6 are popped. The
result 6 * 48 = 288 is
pushed.

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

3.6.3

Inx to Poswix Conversion


Stack can be used to convert an expression in standard form (known as
inx) into poswix.
Given the inx expression:
a + b * c + (d * e + f) * g ,

the poswix is:

a b c * + d e * f + g * +

The stack represents pending operators. Some of the operators on the


stack that have high precedence are known to be completed and should
be popped.

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

3.6.3

Inx to Poswix Algorithm


Inx to Poswix conversion using stack:
a + b * c + (d * e + f) * g

1. Symbol a -> output, Operator + -> Stack,


Symbol b -> output.

4. Operator ( -> Stack, Symbol d -> output.

5. Operator * -> Stack, there is no output


2. Operator * -> Stack, the top entry of operator
stack has a lower precedence -> nothing is output. because of (, Symbol e -> output.

3. Operator + -> stack, pop * and +.

6. Operator + -> stack, pop and output *.

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

7. Operator ) -> stack, output +, empty ().

8. Operator * -> stack, Symbol g -> output.

9. Input is now empty, pop and output


all symbols from the stack.
3.6.3

ApplicaPon: Method Calls


When a call is made to a new method

all the variables local to the calling rouPne need to be


saved by the system.
The current locaPon in the rouPne must be saved so the
new method knows where to go ader it is done.

This can be implemented with the balancing symbols


algorithm using a stack.
The informaPon saved in the stack is called acPvaPon
record or stack frame:
Register values are saved
Return address is saved at the top

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

3.6.3

Running Out of Stack Space


Remember we learned that recursive funcPon is a bad idea,
because:
Overhead associated with stack frame
Problem with stack overow

Tail recursion problem.


Can use a while loop
because nothing nneds
to be saved

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

3.6.3

Using a while Loop


Tail recursion problem can be resolved using a
while loop instead of recursive funcPon call.

SomePmes compiler might


automaPcally detect tail
recursion and use a scheme
similar to the while loop
implementaPon.

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

3.6.3

The Queue ADT


Queue Model:

Array ImplementaPon of Queues

Enqueue: set the back element, currentSize++


Dequeue: return the element at front, currentSize--

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

3.7

Circular Array ImplementaPon

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

3.7.2

Inheritance & Generic Type


Given:






Which statement inserted independently at line 9 will compile? (Choose all
that apply.)
A. return new ArrayList<Inn>();
B. return new ArrayList<Hotel>();
C. return new ArrayList<Object>();
D. return new ArrayList<Business>();
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

Anonymous Subclass
Given:








What is the result?
A. An excepPon occurs at runPme
B. true
C. Fred
D. CompilaPon fails because of an error on line 3
E. CompilaPon fails because of an error on line 4
F. CompilaPon fails because of an error on line 8
G. CompilaPon fails because of an error on a line other than 3, 4, or 8
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

Inner Class
Given:








Which, inserted independently at line 6, compile and produce the output "spooky"?
(Choose all that apply.)
A. Sanctum s = c.new Sanctum();
B. c.Sanctum s = c.new Sanctum();
C. c.Sanctum s = Cathedral.new Sanctum();
D. Cathedral.Sanctum s = c.new Sanctum();
E. Cathedral.Sanctum s = Cathedral.new Sanctum();
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

Local Inner Class


Given:








What is the result?
A. inner
B. outer
C. middle
D. CompilaPon fails
E. An excepPon is thrown at runPme
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4. Trees

Dr. Angus Yeung

CS146 Data Structures and Algorithms, Spring 2015,


Angus Yeung, Ph.D.

Course Structure
Founda'on

Reinforcement

ABend lectures
Read book chapters

WriBen class assignments


Java Programming assignments

Integra'on

Review course material


Study for exams

IntroducOon to CS146
Algorithm Alnalysis

Assignment 1

List Stacks and Queues

Assignment 2

Trees
Hashing

Assignment 3

Priority Queues

Mid-Term

SorOng
The Disjoint Set Class

Assignment 4

Graph Algorithms
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

Final Exam

Agenda for Lecture 9


Book-Keeping
Assignment #2
Quiz #2

Trees (Chapter 4)
Preliminaries

ImplementaOon of Trees
Tree Traversals with an ApplicaOon

Binary Trees

ImplementaOon
An Example: Expression Trees

The Search Tree ADT Binary Search Trees

contains
findMin and findMax
insert
remove
Average-Case Analysis

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

WEEK #5
LECTURE #9

Chapter 4 Overview
Trees in general are very useful abstracOons in
computer science. In Chapter 4, we will
See how trees are used to implement the le system
of several popular operaOng systems.
See how trees can be used to evaluate arithmeOc
expressions.
Show how to use trees to support searching
operaOons in O(logN) average Ome, and how to rene
these ideas to obtain O(logN) worst-case bounds.
Discuss and use the TreeSet and TreeMap classes.
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.0

Tree Preliminaries
parent
edge

Grandparent

child

Every node except the


root has one parent

Depth = 2
Grandchild

siblings
Height (the longest path) = 3

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.1

ImplementaOon of Trees
class TreeNode
{

Object element;
TreeNode firstChild;
TreeNode nextSibling;

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.1.1

ApplicaOon: Unix File System

Advantages:
Allow users to organize their data logically.
Two les in dierent directories can share the same
name.
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.1.2

Pre-order Traversal

Preorder Traversal: work at a node is performed


before its children are processed.
Line 1: print name (once per node);
Line 2: test directory (once per node);
Line 4: recursive call (once for each child);

The total amount of work is constand per node: O(N)


CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.1.2

Post-order Traversal

Postorder Traversal: work at a node is


performed aier its children are evaluated.

If the object is not a directory, the size returns


the number of blocks it uses;
Otherwise, the number of blocks used by the
directory is added to the number of blocks
(recursively) found in all the children.

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.1.2

Binary Trees
Binary Tree

No node can have more than two children;


The depth of an average binary tree is considerably smaller
than N.
Average Depth: O ( N )
Depth for Binary Search Tree: O(log N)
Worst-case: O(N)

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

Worst-case

4.2

ImlementaOon of Binary Trees


ImplementaOon of Binary Tree
A node consists of the element informaOon plus two
references (left and right) to the other nodes
We can keep direct links to children because a binary
tree node has at most two children

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.2.1

Example: Expression Trees


An Example of binary trees: Expression Trees

Leaves: operands, e.g., constants or variable names


Nodes: operators
(a + b * c) + ((d * e + f) * g

General Strategy:

Inorder traversal (lei, node, right)


(a + (b * c)) + (((d * e) + f) * g)
Postorder traveral (lei subtree, right subtree, operator)
a b c * + d e * f + g * +

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.2.2

ConstrucOng an Expression Tree


Input (Posjix): a b + c d e + * *
ConstrucOng an expression tree using stack:
1. Push operands a
and b onto a stack

2. Read +, pop two trees


and form a new tree.

3. Read c, d, and e. create a


one node tree for each.

4. Read +, pop two trees


and form a new tree

5. Read *, pop two trees


and form a new tree.

6. Read *, pop two trees


and form the nal tree.

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.2.2

The Search Tree ADT


An important applicaOon of binary trees is their use in searching.
Binary Search Tree: For every node, X, in the tree:

the values of all the items in its lei subtree are smaller than the item in X,
the values of all the items in its right subtree are larger than the item in X.

The average depth of a binary search tree is O(log N).

This is NOT a
binary search tree
This is a binar
search tree
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.3

Contains, findMin and FindMax


4.3.1 contains
Check node rst, then make recursive call on a subtree
of T
It can be either lei or right subtree depending on the
relaOonship of X to the item stored in T.

4.3.2 findMin and findMax


findMin: start at the root and go lei as long as there
is a lei child;
findMax: start at the root and go right as long as
there is a right child;
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.3

insert
To insert X into tree T, proceed down the tree as you would
with a contains.
If X is found, do nothing (or update something).
Otherwise, insert X at the last sport on the path traversed.
Example: Adding
5 to the tree.

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.3.3

remove
To remove node X from tree T

If X is a leaf, it can be deleted immediately.


If X has one child, the node can be deleted aier its parent adjusts a link to
bypass the node
If X has two children, replace the data of this node with the smallest data of
the right subtree and recursively delte that node.

DeleOon of Node (4) with one child

DeleOon of Node (2) with two children

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.3.4

Average-Case Analysis
Binary search tree should take O(log N) Ome because
in constant Ome we descend a leve in the tree, thus
operaOng on a tree that is now roughly half as large.
Internal Path Length, D(N): sum of the depths of all
nodes in a tree (N-node tree)
D(N) = D(i) + D(N-i-1) + N 1
If all subtree sizes are equally likely, we can put the
average value of subtree into D(N):

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.3.5

Example: 500-node tree


Average of D(N) = O(N log N)
The expected depth of any node is O(log N)
500-node tree has nodes at expected depth 9.98.

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.3.5

Unbalanced Tree
Aier a quarter-million random insert/remove
pairs, the tree looks decidedly unbalanced (average
depth equals 12.51).

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.3.5

Agenda for Lecture 10


Trees (Chapter 4)
The Search Tree ADT Binary Search Trees
Review of source code

AVL Trees
Single RotaOon
Double RotaOon
Review of source code

Splay Trees
Splaying

B-Trees
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

WEEK #5
LECTURE #10

AVL Trees
AVL (Adelson-Velskii and Landis) tree: a binary search tree with
a balance condiOon.
Require the lei and right subtree to have the same height
IdenOcal to a binary search tree, except that for every node, the
height of the lei and right subtrees can dier by at most 1.
Height of AVL: 1.44 log (N+2) 1.328

AVL

Unbalanced

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

Unbalanced

4.4

Smallest AVL Tree of Height 9


Example: Smallest AVL Tree of Height 9

Fewest node (143)


Lei Subtree: height 7 of minimum number of nodes
Right Subtree: height 8 of minimum size

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.4

Rebalancing for InserOon


Rebalancing the node
Height imbalancing: s two subtrees height dier by two
InserOon on the outside: use single rotaOon for balancing
Case 1: An inserOon into the lei subtree of the lei child of .
Case 4: An inserOon into the right subtree of the right child of .

InserOon on the inside: use double rotaOon for balancing


Case 2: An inserOon into the right subtree of the lei child of .
Case 3: An inserOon into the lei subtree of the right child of .

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.4

Single RotaOon: Fixing Case 1


ViolaOon of the AVL balance property

Aier inserOon in Case 1, the lei subtree for Node k2 is two levels depper
than its right subtree

Rebalancing

Move X up a level and Z down a level


Grab k1 and shake it, lesng gravity take hold
Subtree Y now becomes the lei child of k2

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.4.1

Single RotaOon: Example


Unbalanced tree aier inserOon of Node 6
Balanced tree aier single rotaOon between 7 and 8
unbalanced

Child root
becomes the
new root

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.4.1

Single RotaOon: Fixing Case 4


Case 4 represents a symmetric case as Case 1

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.4.1

Single RotaOon: Example


Start with an empty AVL tree, and insert the items 3,
2, 1, and then 4 through 7 in sequenOal order.
InserOng 3, 2, 1

InserOng 6

InserOng 4, 5

InserOng 7

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.4.1

Double RotaOon
Single RotaOon fails to x cases 2 or 3.

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.4.2

Double RotaOon
Lei-right double rotaOon to x case 2

hBp://www.cs.uah.edu/~rcoleman/CS221/Trees/AVLTree.html

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.4.2

Double RotaOon
Right-lei double rotaOon to x case 3

hBp://www.cs.uah.edu/~rcoleman/CS221/Trees/AVLTree.html

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.4.2

Double RotaOon: Example


From Single RotaOon example, insert 10 thru 16 in
reverse order, followed by 8 and then 9.
InserOng 16, 15: right-lei double rotaOon

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.4.2

Double RotaOon: Example


From Single RotaOon example, insert 10 thru 16 in
reverse order, followed by 8 and then 9.
InserOng 14: right-lei double rotaOon

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.4.2

Double RotaOon: Example


From Single RotaOon example, insert 10 thru 16 in
reverse order, followed by 8 and then 9.
InserOng 13: single rotaOon

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.4.2

Double RotaOon: Example


From Single RotaOon example, insert 10 thru 16 in
reverse order, followed by 8 and then 9.
InserOng 12: single rotaOon

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.4.2

Double RotaOon: Example


From Single RotaOon example, insert 10 thru 16 in
reverse order, followed by 8 and then 9.
InserOng 11, 10: single rotaOon

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.4.2

Double RotaOon: Example


From Single RotaOon example, insert 10 thru 16 in
reverse order, followed by 8 and then 9.
InserOng 8: no rotaOon
InserOng 9: double rotaOon

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.4.2

Agenda for Lecture 11


Trees (Chapter 4)
Quize 2
Book-Keeping

WEEK #6
LECTURE #11

Assignment #2 is extended to March 4


Visual Go

Splay Trees
B-Trees
Sets and Maps
Review of AVL source code (if Ome allows)
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

Splay Trees
Basic ideas for Splay Tree:

Not a new type of tree, but a re-implementaOon of the


Binary Search Tree insert, delete, and search methods
The goal is to improve their performance

No single operaOon on a splay tree is guaranteed to


have beBer performance

But a series of m operaOons will take O(M log N) Ome for a


tree of N nodes, whenever M > N

Not highly balanced like an AVL tree

Lowering the cost of an enOre series of operaOons is more


improtant than keeping the tree balanced

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.5

Splay Trees
Whenever a splay tree node is accessed, the tree
performs splaying operaOons that move the accessed
node to the root of the tree.
Splaying a node consists of a series of rotaOons.
Similar to AVL tree rotaOons.

The goal is to move the accessed node to the root.


A side benet is to make the tree more balanced.

The theory is that once a node has been accessed, it


will soon be accessed again.
Future accesses are fast if the node is the root.
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.5

Splay Trees
If a node has not been accessed in a while, you
will pay the performance penalty of splaying in
the next Ome it is accessed.
But access of that node in near furture is very fast.
So we amorOze the cost of splaying over future
operaOons.

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.5

Zig-zag
Strategy: rotate boBom up along the access path.
X is right child, P is the lei child: perform double rotaOon

Both X and P are lei children: transform the tree as below:

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.5

Example
Example, with a contains on k1.

K1 is a zig-zag, so perform double rotaOon using k1, k2, k3:

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.5

Example
K1 is a zig-zag, we do rotaOon with k1, k4 and k5.

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.5

Splay Trees
How is a worst-case BST created ?
When all the nodes are entered in sorted order.
Suppose the boBom node is accessed in such a tree:

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.5

Splay Trees
Splaying at node 1

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.5

Splay Trees
Splaying at node 2

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.5

Splay Trees
Splaying at node 3

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.5

Splay Trees
Splaying at node 4

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.5

Splay Trees
Splaying at node 5

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.5

Splay Trees
Splaying at node 6

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.5

Splay Trees
Splaying at node 7

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.5

Splay Trees
Splaying at node 8

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.5

Splay Trees
Splaying at node 9

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.5

B-trees
A B-tree is a tree data structure suitable for disk
drives
It may take up to 11ms to access data on disk
Todays modern CPUs can execute billions of
instructuions per second
Therefore, it makes sense for us to spend CPU cycles
to reduce the number of disk accesses.

B-trees are oien used to implement databases.


CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.7

B-trees
A B-tree is an m-ary (allowing for M-way branching) tree.

5-ary tree of 31 nodes has only three levels.

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.7

B-trees
B-tree of order 5

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.7

B-trees
B-tree aier inserOon of 57 into the tree

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.7

B-trees
InserOon of 55 into the B-tree causes a split into two leaves

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.7

B-trees
InserOon of 40 causes a split into two leaves and then a split of
the parent node.

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.7

B-trees
B-tree aier the deleOon of 99 from the B-tree

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.7

Sets
Set interface is inherited from Collection
Does not allow duplicates
OperaOons: insert, remove, etc.
Very ecient basic search

SortedSet interface

all items are in sorted order

TreeSet implements SortedSet interface

Basic operaOons take logarithmic worst-case Ome


Implement the Comparable (or Comparator) interface

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.8.1

Maps
Map interface is inherited from Collection

Consists of keys and their values


Keys must be unique, but several keys can map to the same values
Basic operaOons: isEmpty, clear, size, containsKey, get,
put.

SortedMap interface

all items are in logically sorted order

IteraOng through a Map could be tricky because there is no


iterator. Instead these methods are used:

We can return a Set


here since key is unique.

Returning a Set of entries


CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

4.8.2

Summary
Tree

Descrip'on

Applica'on

Binary Trees At most two child nodes

Easy for implementaOon

Binary
Lei-subtree is less than X; Right subtree is
Search Trees larger than X

Ecient search: node visit =


height of the tree

AVL Trees

The dierence between lei-subtree and right Ecient search: always


subtrees not more than one level. Always
maintain minimum height of
rebalancing aier inserOon or removal (single/ tree
double rotaOons)

Splay Trees

Once a node is accessed, it should be made


very accessible for future. This is done by
moving the accessed node to the root.

Repeated access of accessed


node is very ecient.

B-Trees

ImplementaOon of balanced M-ary search


tree, allowing M-way branching.

Ecient search for data on


disk drives.

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

5. Hashing

Dr. Angus Yeung

Course Structure
Founda'on

Reinforcement

A6end lectures
Read book chapters

Wri6en class assignments


Java Programming assignments

Integra'on

Review course material


Study for exams

IntroducJon to CS146
Algorithm Alnalysis

Assignment 1

List Stacks and Queues

Assignment 2

Trees
Hashing

Assignment 3

Priority Queues

Mid-Term

SorJng
The Disjoint Set Class

Assignment 4

Graph Algorithms
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

Final Exam

Agenda for Lecture 12


Book-Keeping
Review of Quiz #2

Hashing (Chapter 5)
General Idea
Hash FuncJon
Separate Chaining
Hash Tables without Linked Lists
Rehashing

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

WEEK #6
LECTURE #12

Why Hash Tables?


Hash tables are good for doing a quick search on things.
For instance if we have an array full of data (say 100 items). If we knew
the posiJon that a specic item is stored in an array, then we could
quickly access it.

For instance, we just happen to know that the item we want it is at posiJon
3; we can apply: myitem=myarray[3];
With this, we don't have to search through each element in the array, we
just access posiJon 3.
The quesJon is, how do we know that posiJon 3 stores the data that we are
interested in?

This is where hashing comes in handy. Given some key, we can apply a
hash funcJon to it to nd an index or posiJon that we want to access.

Chapter Overview
Hashing:
ImplementaJon of hash tables
Technique for performing inserJons, deleJons, and
searches in constant average Jme

We will cover the following in this chapter:


See several methods of implemenJng the hash table
Compare these methods analyJcally
Show numerous applicaJons of hashing
Compare hash tables with binary search trees
5.0

General Idea
Hash table data structure: an array of
some xed size, containing the items
Each item could consist of a key and
addiJonal data elds
Hash funcJon:
the mapping to convert each key into some
numbers in the range from 0 to TableSize
-1 and placed in the appropriate cell.
Distributes the keys evenly among the cells.

Collision:
when two keys hash to the same value.
5.1

Hash FuncJon
Some simple hash funcJons
Keys are in integers: returning Key mod TableSize
Keys are in strings: adding up ASCII values of the
characters in the string

(What if TableSize=10,007
and typical hash funcJon is 127
* 8 = 1,016? It will not be a
good and equitable distribuJon.)

5.2

Hash FuncJon
Some simple hash funcJons
Keys has at least three characters:

Problem: English is not random. 263 = 17,576 possible


combinaJons actually reduces to only 2,851 combinaJons.
5.2

Hash FuncJon
A good hash funcJon:
Mapping using:

Handling the possible overow problem


5.2

Separate Chaining

One collision resoluJon strategy is Separate Chaining.


Separate Chaining: to keep a list of all elements that hash to the same value.
Example: hash(x) = x mod 10
The load factor of a hash table is dened as the raJo of the number of
elements in the hash table to the table size.

5.3

Linear Probing

Besides separate chaining, another strategy is called Open Addressing.


Probing hash tables: it doesnt use separate chaining hashing.
Linear Probing: when a collision happens, it will try cells sequenJally (and weaparound) in search of an
empty cell.
Primary clustering: if the table is relaJvely empty, blocks of occupied cells start forming.

5.4.1

Linear Probing

Number of probes plo6ed against load factor for linear probing (dashed) and random
strategy (S is successful search, U is unsuccessful search, and I is inserJon)

5.4.1

QuadraJc Probing
QuadraJc probing is a collision resoluJon method that eliminates the
primary clustering problem of linear probing: f(i) = I2
When collison occurs, the next posiJon a6empted is one cell away.

5.4.2

Double Hashing
Secondary clustering: quadraJc probing eliminates primary clustering,
elements that hash to the same posiJon will probe the same alternaJve
cells.
Double hashing eliminates secondary clustering: f(i) = i.hash2(x) by
applying a second hash funcJon to x and probing at a distance hash2(x),
2hash2(x), and so on.

5.4.3

Rehashing
Problems when the table gets too full:
Running Jme for the operaJons will take too long
InserJons might fail for open addressing hashing with
quadraJc resoluJon, especially if there are too many
removals intermixed with inserJons.

SoluJon:
Build another table that is about twice as big (with an
associated new hash funcJon)
Scan down the enJre orginial hash table and compute the
new hash value for each (non-deleted) element
Insert the new hash values in the new table.
5.5

Rehashing
Hash funcJon h(x) = x mod 7
Aner inserJng 23, the hash table is over 70% in capacity.

5.5

Rehashing
Linear probing hash table aner rehashing.
New Hash funcJon: h(x) = x mod 17

5.5

Agenda for Lecture 13


Book-Keeping
Hashing (Chapter 5)

WEEK #7
LECTURE #13

Hash Tables in the Standard Library


Hash Tables with Worst-Case O(1) Access
Perfect Hashing
Cuckoo Hashing
Hopscotch Hashing

Extendible Hashing

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

ImplementaJon of Hash Tables


Java includes hash table implementaJons of Set and Map: HashSet
and HashMap
Must provide an equals and hashCode method
Implemented using separate chaining hashing
Use them when we dont care about sorted order.

For the word-changing example:

A map in which the key is a word length, and the value is a collecJon of all words
of that word length. -> HashMap
A map in which the key is a representaJve, and the value is a collecJon of all
words with that representaJve. -> HashMap
A map in which the key is a word, and the value is a collecJon of all words that
dier in only one character from that word. -> HashMap

The performance of a HashMap can onen be superior to a TreeMap.


The best strategy is to use the interface type Map, and then change the
instanJaJon from a TreeMap to a HashMap, and perform Jming tests.
5.6

ImplemenJng hashCode
If a class overrides equals, it must override hashCode
When they are both overridden, equals and hashCode must
use the same set of elds
If two objects are equal, then their hashCode values must be
equal as well
If the object is immutable, then hashCode is a candidate for
caching and lazy iniJalizaJon
It's a popular misconcepJon that hashCode provides a unique
idenJer for an object. It does not.

Caching the Hash Code


A classic Jme-space tradeo is Caching the Hash Code:
Each String object stores internally the value of its hashcode
Why? It is because compuJng the hashCode is expensive
Works only because String are immutable.
If the String were allowed to change, it would invalidate the hashCode and the hashCode
would have to be reset back to 0.

Excerpt of String class hashCode

5.6

Hash Tables w/ Worst-Case O(1) Access


We want to obtain O(1) worst-case cost. Why?
In applicaJons such as hardware implementaJons of lookup tables (LUT) for
routers and memory caches, it is important that the search have a denite
(constant) amount of compleJon Jme.

If N is known, and we are allowed to rearrange


items as they are inserted, then O(1) worst-case
cost is achievable for searches.
There are dierent soluJons to this problem:
Perfect Hashing
Cuckoo Hashing
Hopscotch Hashing
5.7

Perfect Hashing
If we have N items, how do we lower the probability
of collisions?
One Approach:
We can have separate chaining implementaJon, and
Keep each list at most a constant number of items.
As we make more lists, the lists will on average be shorter.

Problems with this separate chaining approach:


Even with lots of lists, we might sJll get unlucky
The number of lists might be unreasonably large
5.7.1

Perfect Hashing
Even with lots of lists, we might sJll get unlucky
Choose M (number of lists) to be suciently large
that probability is at least for no collisions.
If a collision is detected, we simply clear out the
table and try again using a dierent hash funcJon
that is independent of the rst.
Keep trying unJl we get no collisions.
The expected number of trials will be at most 2
(since the success probability is )
5.7.1

Perfect Hashing
The number of lists might be unreasonably large
How large M needs to be?
M needs to be quite large: M=(N2)
If M = N2, table is collision free with probability at
least .
Theorem 5.2

If N balls are placed into M=N2 bins, the probability


that no bin has more than one ball is less than .
(See textbook for the proof)

5.7.1

Perfect Hashing
Using N2 lists is impracJcal.
More pracJcal implementaJon:
Use only N bins, but resolve the collisions in each bin
by using hash tables instead of linked lists.
The bins are expected to have only a few items each,
the hash table for each bin can be quadraJc in the bin
size.

5.7.1

Perfect Hashing
Perfect hashing table using secondary hash tables

5.7.1

Perfect Hashing
The scheme of Perfect Hashing:

The primary hash table can be constructed several Jmes if


the number of collisions that are produced is higher than
required.
Each secondary hash table will be constructed using a
dierent hash funcJon unJl is is collision free.

Theorem 5.3

If N items are placed into a primary hash table containing


N bins, then the total size of the secondary hash tables has
expected value at most 2N.

Perfect hashing works if the items are all known in


advance.

5.7.1

Cuckoo Hashing
If N items are randomly tossed into N bins, the
size of the largest bin is expected to be (log N/
log log N).
If, at each toss, two bins were randomly chosen
and the item was tossed into the more empty bin
(at the Jme), then the size of the largest bin
would only be (log log N), a signicantly lower
number.
Thats so called the power of two choices.
5.7.2

Cuckoo Hashing
Given N items, we maintain two tables:
each more than half empty and
each with independent funcJon for assigning each
item to a posiJon in each table,

Cuckoo hashing maintains the invariant that an


item is always stored in one of these two
locaJons.

5.7.2

Cuckoo Hashing
Item A can be at either posiJon 0 in Table 1, or
posiJon 2 in Table 2.
A search in a cuckoo hash table requires at most two
table accesses

5.7.2

Cuckoo Hashing
The Cuckoo hashing algorithm: To insert a new item
x, rst make sure it is not already there.
If the rst table locaJon is empty, the item can be
placed.

5.7.2

Cuckoo Hashing
The Cuckoo hashing algorithm: To insert a new item
x, rst make sure it is not already there.
If the rst table locaJon is empty, the item can be
placed.

5.7.2

Cuckoo Hashing
To insert B, we can add it to locaJons 0 in Table 1 and 0 in Table
2.
Table 1 is already occupied by A in posiJon 0.
Cuckoo will preempJvely displace A and does not bother to look
at Table 2.

5.7.2

Cuckoo Hashing
InserJon of C is straighyorward.
For InserJon of D with hash locaJons (1,0), Table 1
locaJon is already taken but we dont look at the
Table 2 locaJon.

5.7.2

Cuckoo Hashing
E can be easily inserted.
In order to insert F, we need to displace E, then A, and then B.

5.7.2

Cuckoo Hashing
But we cannot successfully insert G!
G Hash locaJons (1, 2)
Displace D,
Displace B,
Displace A,
Displace E,
Displace F,
Displace C,
Displace G, CIRCULAR DEPENDENCE!
5.7.2

Cuckoo Hashing
Fortunately if the tables load factor is below 0.5,
the probability of a cycle is very low.
If circular dependence really occurs, we can
simply rebuild the tables with new hash funcJons
aner a certain number of displacements are
detected.

5.7.2

Cuckoo Hashing
Cuckoo Hash Table ImplementaJon
Allow an arbitrary number of hash funcJons
Use a single array that is addressed by all the hash
funcJons (instead of two separately addressable hash
tables)
Specify the maximum load to be 0.4 (auto expansion if
higher load)
Specify how many rehashes we will perform

5.7.2

Hopscotch Hashing
Hopscotch Hashing: bound the maximal length of
the probe sequence by a predetermined constant
that is opJmized to the underlying computers
architecture. For example: MAX_DIST = 4.
This gives constant-Jme lookups in the worst
case.
The lookup could be parallelized to simltaneously
check the bounded set of possible locaJons.
5.7.3

Hopscotch Hashing
The hops tell which of the posiJons in the block are
occupied with cells containing this hash value. Thus
Hop[8] = 0010 indicates that only posiJon 10
currently contains items whose hash value is 8, while
posiJons 8, 9, and 11 do not

5.7.3

Hopscotch Hashing
A6empJng to insert H. Linear probing suggests
locaJon 13, but that is too far, so we evict G from
posiJon 11 to nd a closer posiJon

5.7.3

Hopscotch Hashing
A6empJng to insert I. Linear probing suggests locaJon 14, but
that is too far; consulJng Hop[11], we see that G can move
down, leaving posiJon 13 open. ConsulJng Hop[10] gives no
suggesJons. Hop[11] does not help either (why?), so Hop[12]
suggests moving F

5.7.3

Hopscotch Hashing
InserJon of I conJnues: Next B is evicted, and
nally we have a spot that is close enough to the
hash value and can insert I

5.7.3

Extendible Hashing
What if the full amount of data is too large to t in
memory?
Our main concern is the number of disk accesses to get a
given data item
N items to store, M items t on each disk block
Collisions will cause a number of blocks to be examined
resulJng in signicant disk read cost
When hash becomes too full, rehashing will be needed
The cost will be O(N) disk accesses

Extensible hashing:
Search: two disk accesses
InserJon: few disk accesses

5.9

Extendible Hashing
Original Data

5.9

Extendible Hashing
Aner inserJon of 100100 and directory split

5.9

Extendible Hashing
Aner inserJon of 000000 and leaf split

5.9

6. Priority Queues

Dr. Angus Yeung

Course Structure
Founda'on

Reinforcement

A6end lectures
Read book chapters

Wri6en class assignments


Java Programming assignments

Integra'on

Review course material


Study for exams

IntroducJon to CS146
Algorithm Alnalysis

Assignment 1

List Stacks and Queues

Assignment 2

Trees
Hashing

Assignment 3

Priority Queues

Mid-Term

SorJng
The Disjoint Set Class

Assignment 4

Graph Algorithms
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

Final Exam

Agenda for Lecture 14


Book-Keeping
Download of Textbook Source Code

Hashing (Chapter 5)
Hash Tables with Worst-Case O(1) Access
Hopscotch Hashing

Extendible Hashing

Priority Queues (Heaps) (Chapter 6)


Model
Simple ImplementaJon
Binary Heap
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

WEEK #7
LECTURE #14

Textbook Source Code


You may download the textbooks source code
here:
h6p://users.cis.u.edu/~weiss/dsaajava3/code/

Prime Number Checking


We discussed this method yesterday:

This is a short cut.

Prime Number Checking


The Sieve of Eratosthenes
Finding all prime numbers up to any given limit by
iteraJvely marking as the mulJples of each prime.
A non-prime must be a composite: prime x a_number
2

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

11

13

17

19

23

29

HASHING

Hopscotch Hashing
Hopscotch Hashing: bound the maximal length of
the probe sequence by a predetermined constant
that is opJmized to the underlying computers
architecture. For example: MAX_DIST = 4.
This gives constant-Jme lookups in the worst
case.
The lookup could be parallelized to simltaneously
check the bounded set of possible locaJons.
5.7.3

Hopscotch Hashing
The hops tell which of the posiJons in the block are
occupied with cells containing this hash value. Thus
Hop[8] = 0010 indicates that only posiJon 10
currently contains items whose hash value is 8, while
posiJons 8, 9, and 11 do not
0
0
1
0

5.7.3

Hopscotch Hashing
A6empJng to insert H. Linear probing suggests
locaJon 13, but that is too far, so we evict G from
posiJon 11 to nd a closer posiJon

5.7.3

Hopscotch Hashing
A6empJng to insert I. Linear probing suggests locaJon 14, but
that is too far; consulJng Hop[11], we see that G can move
down, leaving posiJon 13 open. ConsulJng Hop[10] gives no
suggesJons. Hop[11] does not help either (why?), so Hop[12]
suggests moving F

5.7.3

Hopscotch Hashing
InserJon of I conJnues: Next B is evicted, and
nally we have a spot that is close enough to the
hash value and can insert I

5.7.3

Extendible Hashing
What if the full amount of data is too large to t in
memory?
Our main concern is the number of disk accesses to get a
given data item
N items to store, M items t on each disk block
Collisions will cause a number of blocks to be examined
resulJng in signicant disk read cost
When hash becomes too full, rehashing will be needed
The cost will be O(N) disk accesses

Extensible hashing:
Search: two disk accesses
InserJon: few disk accesses

5.9

Extendible Hashing
Original Data

5.9

Extendible Hashing
Aner inserJon of 100100 and directory split

5.9

Extendible Hashing
Aner inserJon of 000000 and leaf split

5.9

PRIORITY QUEUES

Chapter Overview
Priority queue are used in many applicaJons:
Queue for the print jobs sent to a printer
1-page jobs should be prioriJzed over 100-page job

OperaJng system scheduler for mulJuser environment


Short jobs should nish as fast as possible, taking
precedence over jobs that have already been running

In this chapter, we will discuss:


Ecient implementaJon of the priority queue ADT
Users of priority queues
Advanced implementaJons of priority queues
6.0

Model
Two main operaJons for a priority queue:
insert the equivalent of enqueue operaJon
deleteMin equivalent of dequeue operaJon

Basic model of a priority queue:

6.1

Simple ImplementaJons
There are several obvious ways to implement a
priority queue:
Linked List

O(1) for inserJons at the front


O(N) for traversing the list to delete the minimum

Sorted Linked List

O(N) for inserJons


O(1) for deleJng the minimum

Binary Search Tree (BST)

O(log N) for inserJons


O(log N) for deleJng the minimum
RepeaJng deleJng the minimum will hurt the balance of the tree
6.2

Binary Heap
A binary heap (or just heap) is a binary tree that is complete.
All levels of the tree are full except possibly for the bo6om level
which is lled from len to right.

6.3

Binary Heap
Conceptually, a heap is a binary tree.
But we can implement it as an array.
For any element in array posiJon i:
Len child is at posiJon 2i
Right child is at posiJon 2i + 1
Parent is at posiJon i / 2

6.3

Heap-Order Priority
We want to nd the minimum value (highest
priority) very quickly.
Make the minimum value always at the root.
Apply this rule also to roots of subtrees.
Weaker rule than for a binary search tree.
Not necessary that values in the len subtree be less
than the root value and values in the right subtree be
greater than the root value.
6.3

Heap-Order Priority
Two complete trees (only the len tree is a heap)

6.3

Heap InserJon
InserJon Strategy Percolate up:
Repeatedly do a heap inserJon on the list of values
Percolate up the hole each Jme from the bo6om of the heap.

InserJng 14
6.3

Heap InserJon
Procedure to insert into a binary heap:

6.3

Delete the Minimum


deleteMin Strategy Percolate down:

Finding the minimum is easy; the hard pard is removing it.


A hole is created at root when the minimum is removed.
Slide the smaller of the holes children into the hole.

6.3

Delete the Minimum


Method to perform deleteMin in a binary heap

6.3

buildHeap
Each dashed line in the gures corresponds to two comparisons
during a call to percolateDown():
One to nd the smaller child.
One to compare the smaller child to the node.

For the bound of the running Jme of buildHeap(), we must


bound the number of dashed lines.
Each call to percolate down a node can possibly go all the way down to the
bo6om of the heap.
There could be as many dashed lines from a node as the height of the node.

The maximum number of dashed lines is the sum of the heights of all the
nodes in the heap.
Prove that this sum is O(N).

6.4

Building Heap
Sketch of buildHeap

6.4

Building Heap
IniJal Heap and aner percolateDown(7)

6.4

Building Heap
percolateDown(6) and percolateDown(5)

6.4

Building Heap
percolateDown(4) and percolateDown(3)

6.4

Building Heap
percolateDown(2) and percolateDown(1)

6.4

Running Time for buildHeap


Prove: For a perfect binary tree of height h containing 2h+1 1
nodes, the sum of the heights of the nodes is
2h+1 1 (h+1)
There is one node (the root) at height h, 2 nodes at height h-1, 4 nodes at
height h-2, and in general, 2i nodes at height h-i.
h

S = 2i ( h i )
i =0

= h + 2(h 1) + 4(h 2) + 8(h 3) + ... + 2h1 (1)

(a)

MulJply both sides by 2:

2S = 2h + 4(h 1) + 8(h 2) + 16(h 3) + ... + 2h (1)

(b)

Subtract (b) (a). Note that 2h 2(h-1) = 2, 4(h-1)-4(h-2) = 4, etc.

S = h + 2 + 4 + 8 + ... + 2h 1 + 2h
= (2h+1 1) (h + 1)

6.4

Running Time for buildHeap


S = (2h+1 1) (h + 1)
A complete tree is not necessarily a perfect binary tree, but it
contains between N = 2h and 2h+1 nodes. Therefore, S = O(N).
And so buildHeap() runs in O(N) Jme.

6.4

Agenda for Lecture 15


Book-Keeping
Assignment #3 available on Canvas

Priority Queues (Heaps) (Chapter 6)


Lenist heaps
Skew Heaps
Binomial Queues

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

WEEK #8
LECTURE #15

Lenist Heaps
A lenist heap is a heap that supports ecient
merging.
A node inserJon into a lenist heap is a merger with a
one-node tree.
A node deleJon of the root splits a lenist tree into two
trees which are then merged back together.

A lenist heap tends to be unbalanced.

6.6

Null Path Length


Null path length npl(X) of any node X is the
length of the shortest path from X to a
node that has only 0 or 1 child.
The npl of a node with 0 or 1 child is 0.
The npl(null) = -1.

The lenist heap property:


For every node X, the npl(left child) npl(right child).

6.6

Null Path Length


Null path lengths for two trees; only the len tree
is lenist.

6.6

Lenist Heap Merger


To merge two lenist heaps H1 and H2:
If either heap is empty, return the other one.
Otherwise, compare their roots and recursively
merge the heap with the larger root with the
right subheap of the heap with the smaller root.
Make the new heap the right child of the
original heap with the smaller root.
Swap the len and right children if necessary
to maintain the lenist property.
6.6

Lenist Heap Merger


Two lenist heaps H1 and H2.

6.6

Lenist Heap Merger


Result of merging H2 with H1s right subheap.

6.6

Lenist Heap Merger


Result of merging H2 with H1s right subheap.

6.6

Lenist Heap Merger


Result of a6aching lenist heap of previous gure
as H1s right child.

6.6

Lenist Heap Merger


Result of swapping children of H1s root.

6.6

Lenist Heap Merger


Result of merging right paths of H1 and H2.

6.6

ImplementaJon of Lenist Heap


An inserJon of a new value is a merger with a
single node.

6.6

ImplementaJon of Lenist Heap


A deleJon (of the root) splits the heap into two
parts which are then merged back together.

6.6

ImplementaJon of Lenist Heap


Driving rouJnes for merging lenist heaps

6.6

ImplementaJon of Lenist Heap


Actual rouJne to merge lenist heaps

6.6

Skew Heaps
Skew Heaps:
Self-adjusJng version of a lenist heap
Simple to implement
Are binary trees with heap order but without structural
constraint on these trees
Unlike Lenist Heaps, Skew Heaps always
Contain no NPL informaJon of any node
Perform uncondiJonal swap
Lenist heaps: will check to see whether the len and right children saJsfy the
lenist heap structure property and swap them if they do not.
Skew Heaps: one excepJon the largest of all the nodes on the right paths
does not hasve its children swapped.
6.7

Skew Heaps
Two skew heaps H1 and H2:

6.7

Skew Heaps
Result of merging H2 with H1s right subheap

6.7

Skew Heaps
Result of merging skew heaps H1 and H2

6.7

Binomial Queues
Binomial Queues:
Similar to Lenist and Skew Heaps in supporJng merging,
inserJon and deletemin operaJons.
Similar to Lenist and Skew Heaps in having O(log N) worst-case
Jme per operaJon for those three operaJons.
But inserJons take only constant Jme on average.

6.8

Binomial Queues
A binomial queue is a collecJon of heap-trees, known as a forest.
Binomial Trees B0, B1, B2, B3, and B4.

6.8

Binomial Queue Merging


Two binomial queues H1 and H2.

6.8

Binomial Queue Merging


Merge of the two B1 trees in H1 and H2

6.8

Binomial Queue Merging


Binamial queue H3: the result of merging H1 and
H2

6.8

Binomial Queue InserJon


InserJng 1 through 7 in order:

6.8

Binomial Queue deleteMin


Performing a deleteMin on H3:

6.8

Binomial Queue deleteMin


Binomial queue H: B3 with 12 removed:

6.8

Binomial Queue deleteMin


Result of applying deleteMin to H3:

6.8

Binomial Queue deleteMin


Binomial queue H3 drawn as a forest:

6.8

Binomial Queue ImplementaJon


The binomial queue is an array of binomial trees
arranged in decreasing rank:

6.8

7. Sor'ng

Dr. Angus Yeung

Course Structure
Founda'on

Reinforcement

A3end lectures
Read book chapters

Wri3en class assignments


Java Programming assignments

Integra'on

Review course material


Study for exams

Introduc'on to CS146
Algorithm Alnalysis

Assignment 1

List Stacks and Queues

Assignment 2

Trees
Hashing

Assignment 3

Priority Queues

Mid-Term

Sor'ng
The Disjoint Set Class

Assignment 4

Graph Algorithms
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

Final Exam

Agenda for Lecture 17


Book-Keeping
Assignment #3
Quiz #3: 4/1/15
Mid-term: 4/6/15

Sor'ng (Chapter 7)
Inser'on Sort
Shellsort
Heapsort
Mergesort
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

WEEK #9
LECTURE #17

Inser'on Sort
No. of input, N = 6

Inser'on sort consists


of N-1 or 5 passes.

Posi'ons 0 through
p-1 are already sorted

Posi'ons p

7.2

Implementa'on of Inser'on Sort


In our implementa'on of Inser'on Sort, we are using comparison-based sor'ng.

The objects being sorted are


of type Comparable.

The element at posi'on p is


stored in tmp.

7.2

Analysis of Inser'on Sort


The number of tests in the inner loop is at most p
+ 1 'mes for each value of p. Summing over all p
gives a total of

If the input is pre-sorted,


the running 'me is O(N).
7.2

A Lower Bound for Simple Sor'ng Algorithms

An Inversion in an array of numbers in any ordered


pair (i,j) having the property that i < j but a[i] > a[j]
The two values are out of order.
Assume were sor'ng from lowest to highest.
For example: [34, 8, 64, 51, 32, 21] had 9 inversion
(34, 8), (34, 32), (34, 21), (64, 51), (64,32), (64, 21), (51,
32), (51, 21), and (32, 21).

Number of Inversions is exactly the number of swaps


that needed to be performed by inser'on sort.
A sorted array has no inversions.
7.3

A Lower Bound for Simple Sor'ng Algorithms


Theorem 7.1 The average number of inversions in an
array of N dis'nct elements is N(N-1)/4.
Proof:
Consider L, list of elements, and Lr, list of elements in reversed order.
Any pair (x, y) represents an inversion in either L or Lr.
Since there are N(N-1)/2 pairs, the average list has half this amount, or
N(N-1)/4 inversions.

Theorem 7.2 Any algorithm that sorts by exchanging


adjacent elements requires (N2) 'me on average.
Proof:
The average number of inversions is N(N-1)/4 or (N2).
Each swap removes only one inversion, so (N2) swaps are required.
7.3

A Lower Bound for Simple Sor'ng Algorithms

The lower-bound proof is valid not only for


Inser'on Sort but also other simple algorithms
such as bubble sort and selec'on sort.
A sor'ng algorithm makes progress by elimina'ng
inversions, and to run eciently, it must eliminate
more than just one inversion per exchange.

7.3

Shellsort
Other notes on inser'on sort
Inser'on sort is fast if the array is nearly sorted
Parallellism? If we can swap non-adjacent values, we may
be able to remove more than one inversion at a 'me.
If we can get the array nearly sorted as soon as possible,
inser'on sort can nish the job quickly.

Donald Shell invented the Shellsort algorithm in 1959


based on these observa'ons.
Shellsort was one of the rst algorithms to break the
quadra'c 'me barrier.
7.4

Shellsort
Basic principle:
We start by comparing elements that are distant
The distance, h, between comparisons decreases as
the algorithms runs un'l the last phase, in which
adjacent elements are compared.
This is referred to as diminishing increment sort.

7.4

Shellsort
Shellsort uses a sequence, h1, h2, , ht, called the
increment sequence.
Like Inser'on Sort, except that we compare values
that are h elements apart in the list: a[i] a[i+hk]
hk diminishes awer comple'ng a pass, e.g., 5, 3, and 1.
The le is said to be hk-sorted. For example, 5-sorted,
3-sorted, etc.
The nal value, h1, must be 1. So the nal pass is
always a regular Inser'on Sort.
7.4

A Shellsort Example
Shellsort awer each pass, using [1, 3, 5] as the increment
sequence.

The choice of h values aects how long the sort takes.

7.4

Implementa'on of Shellsort
Shellsort rou'ne using Shells increments (be3er
increments are possible)
Shells increments:
ht=[N/2] and hk=[hk+1/2]

ht=[N/2]

hk=[hk+1/2]

7.4

Worst-Case Analysis of Shellsort


Shellsort is dicult to analyze
Theorem 7.3 The worst-case running 'me of
Shellsort, using Shells increments, is (N2).
Proof:

provide cases for both O(N2) and (N2).


See textbook page 276.

Theorem 7.4 The worst-case running 'me of


Shellsort using Hibbards increments is (N3/2).
Proof:

see textbook page 277.

Hibbards increments: 1, 3, 7,,2k-1


7.4

Inser'on Sort versus Shellsort


Inser'on sort is simple to implement but the average
running 'me is slow at (N2)
It swaps only adjacent values
An element may traverse a long way through the array
during a pass to arrive at its proper place.

Shellsort is also simple to implement but the average


running 'me is much improved at (N3/2)
Early passes with large h make it easier for later passes
with smaller h to sort
The choice of a good increment sequence for h is
important.

Heapsort
Heapsort is based on using a priority queue with running 'me at O(N log N)
To sort N values into increasing order:
Build a heap: running 'me = O(N)
Remove N dele'ons: O(log N)
Sorted values can be appended to the end of underlying array

Element removed will be


appended to the end.

7.5

Mergesort
Mergesort uses the strategy of Divide and Conquer
Divide: split the list of values into two halves and
recursively sort each half.
Conquer: merge the two sorted halves back

7.6

Mergesort Ch.2 Example


We discussed Divide and Conquer
strategy in Max Subsequence Sum
Problem in Chapter 2: Algorithm 3

Mergesort Illustrated
The basic merging algorithm takes two input
arrays A and B, an output array C, and three
counters, Actr, Bctr, and Cctr

7.6

Mergesort Illustrated
If the array A contains 1, 13, 24, 26, and B contains 2, 15,
27, 38, then the algorithm proceeds as follows:
First, a comparison is done between 1 and 2.
1 is added to C, and then 13 and 2 are compared.

7.6

Mergesort Illustrated
2 is added to C, and then 13 and 15 are compared.

7.6

Mergesort Illustrated
13 is added to C, and then 24 and 15 are compared. This
proceeds un'l 26 and 27 are compared.

7.6

Mergesort Illustrated
26 is added to C, and the A array is exhausted.

7.6

Mergesort Illustrated
The remainder of the B array is then copied to C.

7.6

Analysis of Mergesort
What is the running 'me for Mergesort?
Let T(N) be the 'me to sort N values.
For N=1, the 'me to mergesort is constant, O(1)
Otherwise, it takes T(N/2) for recursive mergesort, and
N to do the merge

We have a recurrence rela'on:

7.6

Analysis of Mergesort
1

Divide both sides by N

3
2

7.6

Agenda for Lecture 20


Book-Keeping
Review of Quiz #3
Review of Mid-term
Review of Assignment #3

Sor'ng (Chapter 7)
Quicksort
Picking the Pivot
Par''oning Strategy
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

WEEK #10
LECTURE #20

Quicksort
Quicksort is one of the most elegant and useful algorithms
in computer science.
A fast divide-and-conquer recursive algorithm
Very 'ght and highly op'mized inner loop

Performance
Average running 'me is O(N log N)
Worst-case performance is O(N2)

Basic idea:
Find a good pivot value in a list
Recursively sort the two sublists
Similar to mergesort but does not require merging or a temp array.
7.7

Quicksort: Algorithm
Simple recursive sor'ng algorithm
Input is divided into three sublists:
smaller, same, and larger

This is the pivot item.

Recursive call on smaller


and larger sublists

7.7

Quicksort: Example
1. Select Pivot
2. Par''on
3. Recursive Sort
7.7

Why is Quicksort faster?


1. Select Pivot
2. Par''on
Performed in place and very ecient

3. Recursive Sort
Sublists may not be equal size.
7.7

Quicksort: Picking the Pivot


Use the rst element as the pivot
Acceptable: if the input is random
Poor Result: if the input is presorted or in reverse order

A safe Maneuver: choose the pivot randomly


But random number genera'on is an expensive opera'on

Median-of-Three Par''oning
Median of the array is hard to calculate
Alterna've: pick three elements randomly and use the median
of these three as the pivot.
Be3er for implementa'on: pick lew, right and center elements
Result: Reduce the number of comparisons by 14%
7.7.1

Quicksort: Par''oning Strategy


Here is one par''oning strategy used in prac'ce:
1. Move Pivot to the last element
3. Swap if needed

2. Move j forward

5. Move pivot to the middle, at i

4. Stop when
i & j cross over

7.7.2

Agenda for Lecture 21


Book-Keeping
Sor'ng (Chapter 7)

WEEK #11
LECTURE #21

7.7 Quicksort
7.7.5 Analysis of QuickSort
7.7.6 A Linear-Expected-Time Algorithm for Selec'on

7.8 A General Lower Bound for Sor'ng


7.11 Linear-Time Sorts: Bucket Sort and Radix Sort
7.12 External Sor'ng
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

Quicksort: Analysis
What is the running 'me to quicksort a list of N?
Par''on the array into two subarrays
(constant cN 'me).
A recursive call on each subarray.

A recurrence rela'on:

where i is the number of values in the lew par''on.

The performance of quicksort is highly dependent on ...


... the quality of the choice of pivot.


7.7.5

Quicksort: Worst-Case Analysis

The pivot is always the smallest value of the par''on, and so i = 0.

7.7.5

Quicksort: Best-Case Analysis


The pivot is always the median. Each subarray is the same size.

7.7.5

Quicksort: Avg-Case Analysis


Each size for a subarray awer par''oning is equally likely, with probability 1/N:

1
T ( N i 1) =
N
Therefore:

N 1

T ( j )
j =0

2 N 1
T ( N ) = T ( j ) + cN
N j =0

N 1

NT ( N ) = 2 T ( j ) + cN 2
(a)
j =0

N 2

( N 1)T ( N 1) = 2 T ( j ) + c( N 1)2 (b)


j =0

Subtract (a) (b):

NT ( N ) ( N 1)T ( N 1) = 2T ( N 1) + 2cN c

7.7.5

Quicksort: Avg-Case Analysis


NT ( N ) ( N 1)T ( N 1) = 2T ( N 1) + 2cN c
Rearrange and drop the insignicant c:

NT ( N ) = ( N + 1)T ( N 1) + 2cN
Divide through by N(N+1):

T ( N ) T ( N 1)
2c
=
+
N +1
N
N +1

Telescope:

T ( N 1) T ( N 2) 2c
=
+
N
N 1
N
T ( N 2) T ( N 3)
2c
=
+
N 1
N 2
N 1

T (2) T (1) 2c
=
+
3
2
3

7.7.5

Quicksort: Avg-Case Analysis


Add and cancel:
N +1
T ( N ) T (1)
1
=
+ 2c
N +1
2
i =3 i
N +1

1
Recall the harmonic number: loge N
i =3 i
And so:

Therefore:

T (N )
= O (log N )
N +1

T ( N ) = O( N log N )

7.7.5

A General Lower Bound for Sor'ng


For any sor'ng algorithm that uses only
comparisons, O(N log N) is as good as we can do.
Mergesort and Heapsort are op'ma to within a
constant factor.

How can we prove that?

7.8

A General Lower Bound for Sor'ng


Prove: Any sor'ng algorithm that uses only
comparisons requires log(N
! ) comparisons in the
worst case and log(N!) comparisons on average.
We can use the decision trees for the proof.
Log(N!) = (N log N)

7.8

A General Lower Bound for Sor'ng


A decision tree for three-element sort
1. Worst case:Depth of the deepest leaf
2. Avg case: Avg depth of the leaves

7.8

A General Lower Bound for Sor'ng


Lemma 7.1: Let T be a binary tree of depth d. Then T
has at most 2d leaves.
Lemma 7.2: A binary tree with L leaves must have
depth at least ceiling(log L).
Theorem 7.6: Any sor'ng algorithm that uses only
comparisons between elements requires at least
ceiling(log(N!)) comparisons in the worst case.
Theorem 7.7: Any sor'ng algorithm that uses only
comparisons between elements requires (N log N)
comparisons. => Log(N!) = (N log N)
7.8

Linear-Time Sorts: Bucket Sort and Radix Sort

For general sor'ng algorithm that uses only


comparisons requires (N log N) 'me in the worst
case.
But it is s'll possible to sort in linear 'me in some
special cases when extra informa'on is available.
Two special cases:
Bucket sort: input is posi've integers smaller than M
Radix Sort: assume input is only small integers
7.11

Bucket Sort
Input: A1, A2,, AN, posi've integers < M.
Bucket Sort Algorithm:
Keep an array called count, of size M ini'alized with
all 0s.
When Ai is read, increment count[Ai] by 1.
Awer all the input is read, scan the count array,
prin'ng out a representa'on of the sorted list.

Note: count has M cells, or buckets.


Algorithm takes O(M + N)
7.11

Radix Sort
Radix sort is some'mes known as card sort. It
was used by the old electromechanical IBM card
sorters to sort punched cards.

7.11

Radix Sort
Input: 10 numbers in the range 0 to 999.
Principle: Too many buckets -> bucket sort not so
useful here. How about use several passes of
bucket sort?
Perform bucket sorts in the reverse order, star'ng
with the least signicant digit rst.

7.11

Radix Sort: Example


For eample, input sequence is 64, 8, 216, 512, 27,
729, 0, 1, 343, 125
Running 'me: O(p(N+b)), p = #passes, N=#input,
b=#buckets

7.11

External Sor'ng
Internal sor'ng algorithms take advantage of the
fact that memory is directly addressable.
External sor'ng algorithms are designed to handle
very large inputs. The input is much too large to t
into memory.
Some'mes the 'me it takes to read the input is
signicant compared to the 'me to sort the input.
Even though sor'ng is an O(N log N) opera'on and
reading the input is only O(N). In realty, reading
the input is much larger than O(N).

7.12

Model for External Sor'ng


For external sor'ng, we consider work on tapes,
which are probably the most restric've storage
medium.
Tapes can be eciently accessed only in sequen'al
order (in either direc'on).
In our model, we assume to have at least three
tape drives to perform the sor'ng. Two drivers for
ecient sor'ng; the third drive simplies ma3ers.
If only one tap drive can be used, any algorithm will
require (N2) tap accesses.

7.12

External Sor'ng: The Simple Algorithm


M=3

Assume that the internal memory can hold and sort M records at a
'me. So M records are read at a 'me from the input tape.
We use four tapes: two input
and two output taps
Merge

Each set of sorted records is called a run.


Awer this is done, we rewind all the tapes.

7.12

External Sor'ng: The Simple Algorithm


Merged

Rewind all four tapes and repeat the same steps.

Con'nue the process un'l we get one run of length N.

7.12

External Sor'ng: The Simple Algorithm


The Simple Algorithm for External Sor'ng requires
ceiling(log(N/M)) passes, plus run-construc'ng
pass.
For example: 10 million records of 128 bytes each,
4 MB of internal memory, then the rst pass will
create 320 runs. We would then need 9 more
passes to complete the sort.

7.12

External Sor'ng: Mul'way Merge


Mul'way Merge: If we have extra tapes, then we
can expect to reduce the number of passes
required to sort our inputs.
The number of passes required using k-way
merging is ceiling(logk(N/M)).

7.12

External Sor'ng: Mul'way Merge


We then need two more passes of three-way merging to complete the sort.

Ceiling(log3(13/3)) = 2

7.12

External Sor'ng: Polyphase Merge


The k-way merging strategy requires the use of 2k
tapes. It is possible to get by with only k+1 tapes.

Split the number of runs into two Fibonacci numbers


FN-1 and FN-2 (below).

1, 1, 2, 3, 4, 8, 13, 21, 34, 55, 89, 144,


7.12

External Sor'ng: Replacement Selec'on


Replacement selec'on: As soon as the rst record
is wri3en to an output tape, the memory it used
becomes available for another record.
Ini'ally, M records are read into memory and
placed in a priority queue.
We perform a deleteMin, wri'ng the smallest
record to the output tape

7.12

External Sor'ng: Replacement Selec'on


.

7.12

8. The Disjoint Set Class

Dr. Angus Yeung

Course Structure
Founda'on

Reinforcement

A8end lectures
Read book chapters

Wri8en class assignments


Java Programming assignments

Integra'on

Review course material


Study for exams

IntroducKon to CS146
Algorithm Alnalysis

Assignment 1

List Stacks and Queues

Assignment 2

Trees
Hashing

Assignment 3

Priority Queues

Mid-Term

SorKng
The Disjoint Set Class

Assignment 4

Graph Algorithms
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

Final Exam

Agenda for Lecture 22


Book-Keeping
Ch.8 The Disjoint Set Class
Equivalence RelaKons
The Dynamic Equivalence Problem
Basic Data Structure
Smart Union Algorithms

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

WEEK #11
LECTURE #22

Reminder: Academic Integrity

0.67% ??!!!!

Wearable CompuKng
Interested in wearable compuKng projects?
Android programming
Android L
Android Wear
Google Fit

iOS Programming
Swid
ANCS

Arduino

Intel Edison Development Kit


Node.js

C Programming

Bluetooth Low Energy (BLE) Protocols

IntroducKon
The Disjoint Set Class is an ecient data structure to
solve the equivalence problem.
Data structure is simple to implement
ImplementaKon is extremely fast
Analysis is extremely dicult

In this chapter, we will


Show relevant implementaKon
Increase its speed, using just two simple observaKons
Analyze the running Kme
See a simple applicaKon
8.0

Equivalence RelaKons
Dene a relaKon R on members of a set S:
For each pair of elements (a, b), where a and b are in
S, a R b is either true or false.
If a R b is true, then a is related to b.

An equivalence relaKon R saKses three


properKes
Reexive: a R a for all a in S.
Symmetric: a R b if and only if b R a.
TransiKve: If a R b and b R c then a R c.
8.1

Example for Equivalence


Electrical connecKvity all connecKons are by metal
wires
Reexive:

a R a for all a in S.
any component is connected to itself.

Symmetric:

a R b if and only if b R a.
If a is electrically connected to b, then b must be electrically
connected to a.

TransiKve:

If a R b and b R c then a R c.
If a is connected to b and b is connected to c, then a is connected
to c.
8.1

Example for Equivalence


Two ciKes in the same country both ciKes are
connected by roads.
Reexive:

a R a for all a in S.
any city is connected to itself.

Symmetric:

a R b if and only if b R a.
If it is possible to travel from city a to city b by roads, then it is
also possible to travel from city b to city a by roads.

TransiKve:

If a R b and b R c then a R c.
If it is possible to travel from city a to city b and from city b to city
c, then it is possible to travel from city a to city c.
8.1

The Dynamic Equivalence Problem


If we use ~ to denote an equivalence relaKon, the problem
statement is then, for any a and b, if a ~ b.
Example:

Given the set: {a1, a2, a3, a4, a5}


There are 25 pairs of elements, either related or not.
The informaKon: a1~a2, a3~a4, a5~a1, a4~a2 implies that all pairs
are related.

The equivalence class of an element: a S is the subset of S


that contains all the elements that are related to a.
Every member of S appears in exactly one equivalence class.
To decide if a~b, we need only to check if a and b are in the
same equivalence class.

8.2

The Dynamic Equivalence Problem


Disjoint Sets: Si n Sj =

If the input is iniKally a collecKon of N sets, each with one


element. The iniKal representaKon is that all relaKons
(except reexive relaKons) are false. Each set has a
dierent element that makes the sets disjoint.
Two permissible operaKons:
find -> returns the name of the set (equivalence class)
containing a given element.
add -> check if a and b are in the same equivalence class. If not,
then apply union to create a new set: Sk = Si U Sj

find(a)==find(b) is true if and only if a and b are in


the same set.

8.2

Performance for find and union


For the find operaKon to be fast, we could
maintain, in an array, the name of the equivalence
class for each element. The running Kme for nd is a
simple O(1) lookup.
For the union(a, b)operaKon, we scan down the
array, changing all the equivalence class i for a to the
equivalence class j for b. The scan takes (N2).
We want to nd a soluKon to the union/nd problem
that makes unions easy but finds hard.
8.2

Basic Data Structure


A find operaKon doesnt need to return any specic
name; just that finds on two elements return the
same answer if and only if they are in the same set.
We can use a tree to represent each set.
For example, we have eight elements iniKally in
dierent sets.

8.3

Basic Data Structure


Ader union(4, 5) and union(6, 7)

8.3

Basic Data Structure


Ader union(4, 6)

Implicit representaKon of previous tree

8.3

Basic Data Structure


A find(x) on element x is performed by
returning the root of the tree containing x.
The Kme to perform this operaKon is proporKonal
to the depth of the node represenKng x.
The worst case is to have a tree of depth N-1, then
the worst case running Kme of a find is (N).

8.3

Basic Data Structure

8.3

Basic Data Structure

8.3

Agenda for Lecture 23


Book-Keeping
Revised class schedule
Final Exam Schedule
Assignment #4

Ch.8 The Disjoint Set Class


Smart Union Algorithms
An ApplicaKon

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

WEEK #12
LECTURE #23

Smart Union Algorithms


Result of union-by-size if the next operaKon were
union(3, 4). For union-by-size, smaller tree
becomes a subtree of the larger.
If unions are done by size, the depth of any node
is never more than log N.

8.4

Smart Union Algorithms


Result of an arbitrary union. If we dont use
union-by-size, a deeper forest will be formed.

8.4

Smart Union Algorithms


Worst-case tree for N=16. This happens when all
unions are between equal-sized trees.

8.4

Smart Union Algorithms


Union-by-height: we can also keep track of the
height, instead of the size. The depth of tree is at
most O(log N).

8.4

Smart Union Algorithms


Source code for union-by-height.

8.4

Path Compression
Problems with the union/nd algorithms
The worst case of O(M log N) for the union/find
algorithm can occur fairly easily and naturally.
If there are many more finds than unions, this
running Kme is bad.

8.5

Path Compression
Path compression is an operaKon that does
something clever on the find operaKon.
Path compression is performed during a find
operaKon and is independent of the strategy used
to perform unions.
Suppose the operaKon is nd(x). Then the eect
of path compression is that every node on the
path from x to the root has its parent changed to
the root.
8.5

Path Compression
An example of path compression ader nd(14) on the
generic worst tree. Nodes 12 & 13, and Node 14 & 15 are
now closer to the root.

8.5

Path Compression
Code for the disjoint set nd with path
compression.

Thats the only change


required in path compression

8.5

An ApplicaKon
An example of the use of the union/nd data structure is the
generaKon of mazes.

8.7

An ApplicaKon
IniKal state: all walls up, all cells in their own set.

8.7

An ApplicaKon
At some point in the algorithm: Several walls down, sets have merged; of
at this point the wall between 8 and 13 is randomly selected, this wall is
not knocked down, because 8 and 13 are already connected.

8.7

An ApplicaKon
Wall between squares 18 and 13 is randomly selceted; this wall is
knocked down, because 18 and 13 are not already connected; their sets
are merged.

8.7

An ApplicaKon
Eventually, 24 walls are knocked down; all elements
are in the same set.

8.7

Wanna Build a Maze Game?


Many great references out there:

Demos of Maze generaKon Algorithms:


h8p://www.jamisbuck.org/presentaKons/rubyconf2011/
index.html
Maze GeneraKon in 3D:
h8p://totologic.blogspot.com/2013/04/maze-generaKon-
in-3d.html
Maze Tutorial (in Java):
h8p://forum.codecall.net/topic/63862-maze-tutorial/
Maze GeneraKon: Ellers Algorithm (in Ruby):
h8p://weblog.jamisbuck.org/2010/12/29/maze-
generaKon-eller-s-algorithm

Material in Backup SecKon is out of the scope for this class.

BACKUP

Slowly Growing FuncKons

Assume that f(N) is a well dened funcKon that


reduces N. For recurrence equaKon in above, we
iteraKvely apply f(N) unKl we reach 1 or less.
We call the soluKon to this equaKon f*(N).
Example: Analysis of Binary Tree
f(N) = N/2; each step halves N.
We do this at most log N Kmes unKl N reaches 1
So we have f*(N)
8.6

Dierent Values of the Iterated FuncKon

The soluKon T(N) = log*N is known as


the iterated logarithm.

8.6

Iterated Logarithm
For pracKcality, the iterated argorithm with base 2
has a value no more than 5.

Note: lg denotes binary logarithm

lg*4 = 2

8.6

An Analysis by Recursive DecomposiKon


Establishing a Kght bound on the running Kme of a sequence of
M=(N) union/find operaKons:

Lemma 8.1 When execuKng a sequence of union instrucKons, a


node of rank r>0 must have at least one child of rank 0, 1,,r-1.
Lemma 8.2 At any point in the union/nd algorithm, the ranks of the
nodes on a path from the leaf to a root increase monotonically.

8.6

ParKal Path Compression

Algorithm A is our standard sequence of union-by-rank and find with path


compression operaKons. We design an Algorithm B to perform all the union prior to
any find.
Then each nd operaKon in algorithm A is replaced by a parKal nd operaKon in
Algorithm B.
A parKal nd operaKon species the search item and the node up to which the path
compression is performed. The node that will be used is the node that would have been
the root at the Kme the matching nd was performed in algorithm A.

8.6

9. Graph Algorithms

Dr. Angus Yeung

Course Structure
Founda'on

Reinforcement

A9end lectures
Read book chapters

Wri9en class assignments


Java Programming assignments

Integra'on

Review course material


Study for exams

IntroducWon to CS146
Algorithm Alnalysis

Assignment 1

Quiz 1

List Stacks and Queues


Trees
Hashing
Priority Queues
SorWng

Assignment 2

Quiz 2
Quiz 3

Assignment 3

Mid-Term

The Disjoint Set Class


Graph Algorithms

Assignment 4

Algorithm Design Techniques


CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

This Lecture

Quiz 4
Final Exam

Agenda for Lecture 24


Book-Keeping
Ch.9 Graph Algorithms
IntroducWon
DeniWons
Topological Sort
Shortest-Path Algorithms

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

WEEK #12
LECTURE #24

IntroducWon
We are going to discuss several common problems in
graph theory.
In many applicaWons, they are too slow unless we pay
a9enWon to the choice of data structure.

In this chapter, we will

Show the conversion of real-life problems to problems on


graphs
Give algorithms to solve several commone graph problems
Show how the proper choice of data structures can
drasWcally reduce the running Wme of these algorithms
Learn depth-rst search and show how it can be used to
solve several seemingly nontrivial problems in linear Wme.
9.0

Graph Theory
In computer science, graph theory is the study of
graphs, which are mathemaWcal structures used
to model pairwise relaWons between objects.

Seven Bridges of Knigsberg


The problem was to nd a walk through the city that
would cross each bridge once and only once.
The islands could not be reached
by any route other than the
bridges, and every bridge must
have been crossed completely
every Wme; one could not walk
halfway onto the bridge and then
turn around and later cross the
other half from the other side.
Leonhard Euler proved that the
problem has no soluWon. There
could be no non-retracing the
bridges.

Seven Bridges of Knigsberg


Euler pointed out that the choice of route inside each land
mass is irrelevant.
The only important feature of a route is the sequence of
bridges crossed. That allowed him to reformualte the
problem in abstract terms.
The resulWng mathemaWcal structure is called a graph.

ApplicaWons for Graphs


As one of the most versaWle data structures in
computer science, graph nd its presence in many
applicaWons:
RepresentaWon of a map of locaWons and distances
between them;
State transiWons in computer algorithms
RelaWonships such as family trees, business and
military organizaWons, etc.
ConnecWvity in computer and communicaWons
networks.

Social Network Diagram


A social network
diagram displaying
friendship Wes
among a set of
facebook users.

Graph, VerWces and Edges


A graph G = (V, E) is a set of verWces V and a set of
edges (or arcs) E.
An edge is a pair (v, w), where v and w are in V.
If the pair is ordered, the graph is directed and is called a
digraph.
Vertex w is adjacent to vertex v if and only if (v, w) is in E
In an undirected graph, both (v, w) and (w, v) are in E.
v is adjacent to w, and w is adjacent to v.

An edge can have a weight or cost component.



9.1

Planning a Road Trip


Round trip: ~2,000 miles

Path
A path is a sequence of verWces w1, w2, w3, ..., wN
where (wi, wi+1) is in E, for 1 i < N.
The length is the path is the number of edges on the
path.
A simple path has all disWnct verWces, except that the
rst and last can be the same.

9.1

Cycle
A cycle in a directed graph is a path of length 1
where w1 = wN.
A directed graph with no cycles is acyclic.
A DAG is a directed acyclic graph.

9.1

More on DeniWons
An undirected graph is connected if there is a path
from every vertex to every other vertex.
A directed graph with this property is strongly
connected.
A directed graph is weakly connected if it is not
strongly connected but the underlying undirected
graph is connected.

A complete graph has an edge between every pair


of verWces.

9.1

RepresentaWon of Graphs
Represent a directed graph with an adjacency list.
For each vertex, keep a list of all adjacent verWces.

9.1

Topological Sort
A topological sort is an ordering of verWces in a
directed acyclic graph, such that if there is a path
from vi to vj, then vi comes before vj in the
ordering.

9.2

Topological Sort
We can use a graph to represent the prerequisites
in a course of study.
A directed edge
from Course A to
Course B means
that Course A is
a prerequisite
for Course B.

9.2

Topological Sort
Topological sort example using a queue.
Start with vertex v1.
On each pass, remove the verWces with indegree = 0.
Subtract 1 from the indegree of the adjacent verWces.

The indegree of a vertex v is the number of edges (u, v).

9.2

Topological Sort
Result of applying topological sort to the graph
A vertex is put on the queue as soon as its indegree falls to 0.

9.2

Pseudo-code for Topological Sort

Running Wme = O(|E|+|V|)

9.2

Agenda for Lecture 25


Ch.9 Graph Algorithms
Shortest-Path Algorithms
Unweighted Shortest Paths
Dijkstras Algorithm
Graphs with NegaWve Edge Costs
Acyclic Graphs

Book-Keeping
Slide deck uploaded to Canvas
Midterm grades
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

WEEK #13
LECTURE #25

Shortest-Path Algorithms

Assume there is a cost associated with each edge.


The cost of a path is the sum of the cost of each edge on
the path.
We consider the weighted path length in our problem.

9.3

Shortest-Path Algorithms
Find the least-cost path from a disWnguished
vertex s to every other vertex in the graph.

Shortest Weighted Path from V1 to V6 has a cost of 6.

9.3

Shortest-Path Algorithms
A graph with a negaWve-cost .
The shortest path from from v5 to v4 is undened.
NegaWve-Cost Cycle

9.3

Shortest-Path Algorithms
We are going to examine algorithms to solve
four versions of the shortest path problem.
Solve the unweighted shortest-path problem.
Solve the weighted shortest-path problem if there are no
negaWve edges.
Solve the weighted shortest-path problem if the graph has
negaWve edges.
Solve the weighted problem for the special case of acyclic
graphs in linear Wme.

9.3

Unweighted Shortest-Path
An unweighted directed graph G
Unweighted shortest path is clearly a special case of the
weighted shortest path problem, since we could assign
all edges a weight of 1.

9.3.1

Breadth-First Strategy
Breadth First Search: Processing verWces in layers.
The verWces closest to the start are evaluated rst,
and the most distant verWces are evaluated last.
Graph aper marking the start node as reachable in
zero edges

9.3.1

Breadth-First Strategy
Graph aper nding all verWces whose path
length from s is 1.

9.3.1

Breadth-First Strategy
Graph aper nding all verWces whose shortest
path is 2

9.3.1

Breadth-First Strategy
Final shortest paths

9.3.1

TranslaWng Strategy into Code


IniWal conguraWon of table used in unweighted
shortest-path computaWon
Keep the tentaWve distance from
vertex v3 to another vertex in dv.
Keep track of the path in pv.
A vertex becomes known aper it
has been processed.
No cheaper path can be found.

Start with 0 as the current


distance.

9.3.1

TranslaWng Strategy into Code


During each iteraWon,
process an unknown
vertex v whose distance
dv = current distance.
Mark v as known.
For each vertex w
adjacent to v:

5
2

Set its distance dw to the


current distance + 1
Set its path pw to v.

3
4
1

Increment the current


distance.

9.3.1

Unweighted Shortest-Path
Rened algorithm using a queue

Known eld
can be discarded.

9.3.1

Unweighted Shortest-Path
Rened algorithm using a queue

9.3.1

Weighted Shorted Path


Use greedy algorithm: Dijkstras
Algorithm
Greedy algorithm does what appears to be
the best at each stage;
It may not always work.

ImplementaWon:
Keep the same informaWon for each vertex;
The informaWon is either known or
unknown
TentaWve distance dv
Path informaWon pv
9.3.2

Dijkstras Algorithm
IniWal conguraWon of table used in Dijkstras
algorithm.

9.3.2

Dijkstras Algorithm
Aper v1 is declared known.

9.3.2

Dijkstras Algorithm
Aper v4 is declared known.

9.3.2

Dijkstras Algorithm
Aper v2 is declared known.

V2*:
Not updaWng V5:
(2+10) > 3

9.3.2

Dijkstras Algorithm
Aper v5 and the v3 are declared known.

V3*:
UpdaWng V6:
(3+5) < 9
V5*:
Not updaWng V7: (3+6) > 5
9.3.2

Dijkstras Algorithm
Aper v7 is declared known.

V7*:
UpdaWng V6:
(5+1) < 8

9.3.2

Dijkstras Algorithm
Aper v6 is declared known and algorithm
terminates.

9.3.2

Dijkstras Algorithm
Pseudocode for Dijkstras Algorithm

9.3.2

Graphs with NegaWve Edge Costs


Dijkstras algorithm does not
work with the graph with
negaWve edge costs.
Adding a constant to each
edge cost wont work.
A combinaWon of the
weighted and unweighted
algorithms will solve the
problem, but at the cost of an
increase in running Wme.

Not using the


concept of
known verWces

Pseudocode for weighted


shortest-path algorithm with
negaWve edge costs
9.3.3

Agenda for Lecture 26


Book-Keeping
Ch.9 Graph Algorithms
Acyclic Graphs
Shortest-Path Example
Network Flow Problems
A Simple Maximum-Flow Algorithm

Minimum Spannign Tree


Prims Algorithm
Kruskals Algorithm

ApplicaWons of Depth-First Search



CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

WEEK #13
LECTURE #26

Acyclic Graphs
Vertex SelecWon Rule:
If the graph is known to be acyclic, we can improve
Dijkstras algorithm by changing the order in which
verWces are declared known.
The new rule is to select verWces in topological
order.
It works because when a vertex is selected, its
distance can no longer be lowered, since by the
topological ordering rule it has no incoming edges
emanaWng from unknown nodes.
9.3.4

CriWcal Path Analysis


CriWcal Path Analysis is an applicaWon of acyclic
graph.
Like downhill skiing problem we want to get from
point a to b but can only go downhill, so clearly
there are no cycles.

9.3.4

AcWvity Node Graph


AcWvity-node graph shows the Wme it takes to
complete the acWvity of each node.
Which task can be delayed?

9.3.4

Event Node Graph


We use convert the acWvity node graph to an
event node graph to nd the compleWon Wme of
the project.

9.3.4

Event Node Graph


The earliest compleWon Wme for the node i is:

9.3.4

Event Node Graph


The latest Wme, LCi, that each event can nish
without aecWng the nal compleWon Wme:
Work backward

9.3.4

Event Node Graph


The slack Wme for each edge in the event node graph represents the
amount of Wme that the compleWon of the corresponding acWvity can
be delayed without delaying the overall compleWon.

Earliest compleWon Wme, latest compleWon Wme, and slack.

9.3.4

Shortest-Path Example

Word ladder problem. For instance: zero hero here hire fire five

9.3.6

Model for Breadth-First Search


Think of applying Breadth-First Search when we are
searching for a lost child inside a large building full of
rooms.
Here is our model:

Graph = The large building


Edge = Hallway between rooms
Vertex = Each room

How we search for the lost child:

We start in the room where we have last seen the child


We search each room adjacent to the rst room and put a tag on the
door to mark the room as serached.
Then we search each room adjacent to the rooms we have already
searched
We repeatly search all the rooms adjacent to the rooms already
searched unWl we

Model for Depth-First Search


Think of applying Depth-First Search when you are lost in a
maze:
Here is our model:

Graph = Maze
Edge = Path
Vertex = Each intersecWon in the maze

How we can get out of the maze:

Supposely we have a bag of bread crumbs


Drop bread crumbs to mark your path
Whenever we come to a dead end, we retrace our path by following our
bread crumbs
We conWnue retracing our path or backtracking to an interacWon with
unmarked path
We go down the unmarked path with the same backtracking when we hit
a dead end again
Repeat the process unWl we get out of the maze

Network Flow Problems


The problem is to determine the maximum ow of
glow that can pass from s to t.
A graph (lep) and its maximum ow:
source

The max ow is 2 + 3 = 5

sink

9.4

Nertwork Flow Problems


A cut in graph G parWWons the verWces with s and t in
dierent groups. The total edge cost across the cust is 5,
proving that a ow of 5 is maximum:

9.4

A Simple Maximum-Flow Algorithm


IniWal stages of the graph, ow graph, and
residual graph:

9.4.1

A Simple Maximum-Flow Algorithm


G, Gf, Gr aper two units of ow added along s, b,
d, t:

9.4.1

A Simple Maximum-Flow Algorithm


G, Gf, Gr aper two units of ow added along s, a,
c, t:

9.4.1

A Simple Maximum-Flow Algorithm


G, Gf, Gr aper one unit of ow added along s, a,
d, t algorithm terminates:

9.4.1

A Simple Maximum-Flow Algorithm


G, Gf, Gr if iniWal acWon is to add three units of ow
along s, a, d, t algorithm terminates aper one
more step with subopWmal soluWon: 3 + 1 = 4

9.4.1

A Simple Maximum-Flow Algorithm


Graphs aper three units of ow added along s,
a, d, t using correct algorithm

Undoing the ow
9.4.1

A Simple Maximum-Flow Algorithm


Graphs aper two units of ow added along s, b,
d, a, c, t using correct algorithm

Undoing the ow
9.4.1

A Simple Maximum-Flow Algorithm


The verWces reachable from s in the residual graph
form one side of a cut; the unreachable from the
other side of the cut.

9.4.1

Minimum Spanning Tree


Minimum Spanning Tree (MST)
Is an acyclic tree
Spans every vertex
Has |V|-1 edges
Has minimum total cost

Add each edge to a Minimum Spanning Tree in


such a way that
It does not create a cycle
Is the least cost addiWon

It is a greedy algorithm
9.5

Minimum Spanning Tree


A graph G and its minimum spanning tree:

9.5

Prims Algorithm
Prims algorithm aper each stage:

9.5.1

Prims Algorithm
IniWal conguraWon of table used in Prims
algorithm for Minimum Spanning Tree:

9.5.1

Prims Algorithm
The table aper v1 is declared known:

9.5.1

Prims Algorithm
The table aper v4 is declared known:

9.5.1

Prims Algorithm
The table aper v2 and then v3 are declared
known:

9.5.1

Prims Algorithm
The table aper v7 is declared known:

9.5.1

Prims Algorithm
The table aper v6 and then v5 are selected
(Prims algorithm terminates).

9.5.1

Kruskals Algorithm
Kruskals is a greedy algorithm using equivalence
classes
First parWWon the verWces into |V| equivalence classes
Process the edges in order of weight
Add an edge to the Minimum Spanning Tree and combine
two equivalence classes if the edge connects two verWces in
dierent equivalence classes

9.5.2

Kruskals Algorithm
AcWon of Kruskals algorithm on G:

9.5.2

Kruskals Algorithm
Kruskals algorithm aper each stage:

9.5.2

Kruskals Algorithm
Pseudocode for Kruskals algorithm:

9.5.2

ApplicaWons of Depth-First Search


Pseudocode for Depth-rst Search Template:

Tips for graph traversal algorithms

Similar to tree traveral: visit each vertex in a parWcular order


It may not be possible to reach all verWces from the start vertex
The graph may contain cycles and we should not go into an innite
loop
The problem can be solved by marking each vertex visited aper each
visit and avoiding revisiWng marked verWces.

9.6

Undirected Graphs
An undirected graph and depth-rst search of the graph:
1

4
5

3
9.6.1

Undirected Graphs
An undirected graph and depth-rst search of the graph:

5
2
4
3
6

10

8
7

11

Forward

Marked
already

Return

9.6.1

BioconnecWvity
A connected undirected graph is bioconnected if there are no verWces
whose removal disconnects the rest of the graph.
VerWces that are not bioconnected are called as arWculaWon points.
A graph with arWculaWon points C and D, and Depth-rst tree with
Num and Low:
1 1, special case
Low is the minimum of
Num(v)
Lowest Num(w) among all back edges
Lowest Low among all tree edges

ArWculaWon Point:
Low(child) Num
7 3

4 4
9.6.2

BioconnecWvity
Depth-rst tree that results if depth-rst search starts at C:
1 1, special case
ArWculaWon Point:
Low(child) Num

7 1
2 2

9.6.2

BioconnecWvity
RouWne to assign Num to verWces:

9.6.2

BioconnecWvity
Pseudocode to compute Low and to test for arWculaWon
points (test for the root is omi9ed):

9.6.2

BioconnecWvity
TesWng for arWculaWon points in one depth-rst search

9.6.2

Euler Circuits
A Puzzle:

Reconstruct these gures using a pen, drawing each line exactly once.
The pen may not be liped from the paper while the drawing is being
performed.
As an extra challenge, make the pen nish at the same point at which it
started.

9.6.3

Euler Circuits
Conversion of puzzle to graph:

9.6.3

Euler Circuits
Graph for Euler circuit problem:

9.6.3

Euler Circuits
Graph remaining aper 5, 4, 10, 5:

9.6.3

Euler Circuits
Graph remaining aper 5, 4, 1, 3, 7, 4, 11, 10, 7, 9, 3, 4, 10, 5:

9.6.3

Euler Circuits
Graph remaining aper 5, 4, 1, 3, 2, 8, 9, 6, 3, 7, 4, 11, 10, 7, 9,
3, 4, 10, 5

9.6.3

10. Algorithm Design Techniques

Dr. Angus Yeung

Course Structure
Founda'on

Reinforcement

A9end lectures
Read book chapters

Wri9en class assignments


Java Programming assignments

Integra'on

Review course material


Study for exams

IntroducVon to CS146
Algorithm Alnalysis

Assignment 1

Quiz 1

List Stacks and Queues


Trees
Hashing
Priority Queues
SorVng

Assignment 2

Quiz 2
Quiz 3

Assignment 3

Mid-Term

The Disjoint Set Class


Graph Algorithms
Algorithm Design Techniques

This Lecture

Assignment 4

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

Quiz 4
Final Exam

Agenda for Lecture 27


Book-Keeping
Ch.10 Algorithm Design Techniques
IntroducVon
Greedy Algorithms
A Simple Scheduling Problem

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

WEEK #14
LECTURE #27

IntroducVon
List Stacks and Queues
Trees
Hashing
Priority Queues

We have been concerned with


the ecient implementaVon of
algorithms from Chapter 3 to 9.

SorVng
The Disjoint Set Class
Graph Algorithms
Algorithm Design Techniques


In Chapter 10, we will discuss
the design of algorithm and see
the general approach.
10.0

ClassicaVon of Algorithm Design Techniques


Greedy
Algorithms

A Simple Scheduling Problem


Human Codes
Approx. Bin Packing

Divide and
Conquer

Running Time
Closest Points Problem
SelecVon Problem

Dynamic
Programming

Using a Table Instead of Recursion


Ordering Matrix MulVplicaVons
OpVmal Binary Search Tree
All-Pairs Shortest Path

Randomized
Algorithms

Random Number Generators


Skip Lists
Primality TesVng

Backtracking
Algorithms

The Turnpike ReconstrucVon Problem


Games
10.0

Greedy Algorithms
We have already seen three greedy algorithms in
the previous chapter: Dijkstras, Prims and
Kruskals.
Greedy Algorithms always choose local opVmum
take what you can get now strategy, instead of
global opVmum.
For example: Coin Changing Problem
We repeatedly dispense the largest denominaVon.
To give out $17.61, we give out $10, $5, two $1, two
$0.5, $0.1, $0.01 bills or coins.
10.1

A Simple Scheduling Problem


Suppose we have four jobs with various running
Vme, the total cost, C, of the schedule is

This sum aects the total cost.


We want this sum to be the maximum.

10.1.1

A Simple Scheduling Problem


Jobs and Vmes, as well as Schedule #1.
Average compleVon Vme is (15+23+26+36)/4=25.

10.1.1

A Simple Scheduling Problem


Schedule #2 (opVmal)
Average compleVon Vme = (3+11+21+36)/4=17.75

10.1.1

The MulVprocessor Case


For an exampling in the mulVprocessor case, we have 9 jobs running on 3
processors.
The total Vme is 165 for an average of 165/9=18.33.
The algorithm is to start jobs in order, cycling through processors.

10.1.1

The MulVprocessor Case


There is a second opVmal soluVon for the mulVprocessor
case.
The total Vme to compleVon is (3 + 5 + 6 + 14 + 15 + 20 + 30
+ 34 + 38)/9=165/9=18.33.

10.1.1

The MulVprocessor Case


We also consider the minimum nal compleVon Vme,
which is 34.
Minimizing the nal compleVon Vme is apparently much
harder than minimizing the mean compleVon Vme.

10.1.1

Agenda for Lecture 28


Book Keeping
Ch.10 Algorithm Design Techniques
Greedy Algorithms
Human Codes

Divide and Conquer


Closest-Points Problem

WEEK #14
LECTURE #28

Dynamic Programing
Using a Table Instead of Recursion
OpVmal Binary Search Tree
CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

Agenda for Lecture 29


Ch.10 Algorithm Design Techniques
Greedy Algorithms
Human Codes

Divide and Conquer

WEEK #15
LECTURE #29

Closest-Points Problem

Dynamic Programing
Using a Table Instead of Recursion

Randomized Algorithms
Backtracking Algorithms
The Turnpike ReconstrucVon Problem

CS146 Data Structures and Algorithms, Spring 2015, Angus Yeung, Ph.D.

Human Codes
Human code is used in greedy algorithm for le compression.
For example, a le contains only a, e, i, s, t, spaces and newlines.
This le requires 174 bits to represent, since each character
requires three bits.

10.1.2

Human Codes
In large les, there is usually a big disparity between
the most frequent and least frequent characters.
Human Codes allow the code length to vary from
character to character and ensure the frequently
occuring characters have short codes.
Human codes is ecient in represenVng data
(resulVng in removing data redundancy).
If all the characters occur with the same frequency,
then there are not likely to be any savings.
10.1.2

Human Codes
The binary code that represents the alphabet can be
represented by the binary tree.
The representaVon of each character can be found by starVng at
the root and recording the path, using a 0 to indicate the ler
branch and a 1 to indicate the right branch.
For instance, s is reached by going le, then right, and nally
right. This is encoded as 011.

10.1.2

Human Codes
Since newline is an only child, we can place
newline one level higher at its parent.
This will save 1 bit to represent newline and the
new tree has cost of 173.

10.1.2

Prex Code
If the characters are placed only at the leaves, any
sequence of bits can always be decoded
unambiguously.
For instance, suppose
010011110001011000100011 is the encoded
string. 0 is not a character code, 01 is not a
character code, but 010 represets i.
It does not ma9er if the character codes are
dierent lengths, as long as no character code is a
prex of another character code.
10.1.2

Prex Code
OpVmal prex code:

10.1.2

Humans Algorithm
Assume the number of characters is C.
Humans Algorithm:
Maintain a forest of trees. The weight of a tree is
equal to the sum of the frequencies of its leaves.

10.1.2

Humans Algorithm
C-1 Vmes, select the two trees, T1 and T2, of
smallest weight, breaking Ves arbitrarily, and form
a new tree with subtrees T1 and T2.
Humans algorithm arer the rst merge:

10.1.2

Humans Algorithm
Humans algorithm arer the second merge:

10.1.2

Humans Algorithm
Humans algorithm arer the third merge:

10.1.2

Humans Algorithm
Humans algorithm arer the fourth merge:

10.1.2

Humans Algorithm
Humans algorithm arer the rh merge:

10.1.2

Humans Algorithm
Humans algorithm arer the nal merge:

10.1.2

Humans Algorithm
There are two details that must be considered.
Transmission of Code Book:
The encoding informaVon must be transmi9ed at te
start of the compressed le, since otherwise it will be
impossible to decode.

Two-pass Algorithm:
The rst pass collects the frequency data and the
second pass does the encoding.
10.1.2

Divide and Conquer


Divide-and-conquer-algorithms consist of two parts:
Divide: Smaller problems are solved recursively
Conquer: The soluVon to the original problem is then
formed from the soluVons to the subproblems.

Past Examples:
Chapter 2: Maximum Subsequence Sum Problem with O(N
log N) soluVon
Chapter 4: Linear-Vme tree traversal strategies (preorder
& postorder traversal)
Chapter 7: Mergesort and quicksort
10.2

Closest-Points Problem
In Closest-Points Problem, we are required to nd the
closest pair of points.
Below shows a small point set.

10.2.2

Closest-Points Problem
We can compute dL and dR recursively. Then how about dC?
P parVVoned into PL and PR; shortest distances are shown.

10.2.2

Closest-Points Problem
Let =min(dL, dR). We only need to compute dC if dC improves on .
Below shows a two-lane strip, containing all points considered for dC strip

10.2.2

Closest-Points Problem
For large point sets that are uniformly distributed,
the number of points that are expected to be in the
strip is very small.
In this case, we can use brute-force calculaVon of
min(, dC)

10.2.2

Closest-Points Problem
In the worst case, all the points could be in the strip.
We need a be9er algorithm using rened calculaVon
of min(, dC)

10.2.2

Closest-Points Problem
For p3, only p4 and p5 are considered in the second
for loop since they lie in the strip within verVcal
distance.

10.2.2

Dynamic Programming
A problem that can be mathemaVcally expressed
recursively can also be expressed as a recursive
algorithm.
But the implementaVon of a direct translaVon of
recursive formula may not give out an ecient
program results.
Dynamic programming rewrites the recursive
algorithm as a nonrecursive algorithm that
systemaVcally records the answers to the sub-
programs in a table, thus giving help to the compiler
to generate more ecient code.
10.3

Using a Table Instead of Recursion


Inecient algorithm to compute Fibonacci numbers

FN -> FN-1 and FN-2


FN-1 -> FN-2 and FN-3
FN-2 -> FN-3 and FN-4

10.3.1

Using a Table Instead of Recursion


Trace of the recursive calculaVon of Fibonacci numbers

10.3.1

Using a Table Instead of Recursion


Linear algorithm is more ecient if we removed the
recursive calls.

10.3.1

Using a Table Instead of Recursion


Another recursive example:

10.3.1

Using a Table Instead of Recursion


Trace of the recursive calculaVon in eval:

10.3.1

Using a Table Instead of Recursion


ImplementaVon with a table:

10.3.1

Backtracking Algorithms
A backtracking algorithm usually doesnt have
good performance but in many cases it has
signicant savings over a brute-force exhausVve
search.
For example, O(N2) is not good but it is
signicantly be9er than an O(N5) algorithm.
Backtracking Example: the Turnpike
ReconstrucVon Problem
10.5

The Turnpike ReconstrucVon Problem


The turnpike reconstrucVon problem is to
reconstruct a point set from the distances.
The given algorithm typically runs in O(N2log N)
but can take exponenVal Vme in the worst case.
As an example:

10.5.1

The Turnpike ReconstrucVon Problem


The turnpike reconstrucVon problem is to
reconstruct a point set from the distances.
The given algorithm typically runs in O(N2log N)
but can take exponenVal Vme in the worst case.
As an example:

10.5.1

The Turnpike ReconstrucVon Problem


IDI = 15, N=6.
Clearly, we can put x6 = 10 onto the Vmeline.
We remove 10 from D and the remaining
distances are as shown in below:

10.5.1

The Turnpike ReconstrucVon Problem


The largest remaining distance is 8, which means that
either x2 = 2 or x5 = 8.
By symmetry, both choices lead to soluVons (which
are mirror images of each other).
We can remove the distances x6-x5 = 2 and x5-x1 = 8
from D.

10.5.1

The Turnpike ReconstrucVon Problem


Since 7 is the largest value in D, either x4 = 7 or x2 = 3.
If x4 = 7, then x6 7 = 3 and x5 7 = 1 must also be present in D.
=> And this is TRUE.
If x2 = 3, then 3 x1 = 3 and x5 3 = 5 must also be present in D.
=> And this is TRUE.
So this step is not obvious. Trying that rst choice x4 = 7.
We can remove the distances x6 7 = 3 and x5 7 = 1 from D.

10.5.1

The Turnpike ReconstrucVon Problem


Now 6 is the largest value in D, either x3 = 6 or x2 = 4.
If x3 = 6 , then x4 x3 = 1. => And this is IMPOSSIBLE.
If x2 = 4, then x2 x0 = 4 and x5 x2 = 4 => And this is
IMPOSSIBLE.
So we have to backtrack and determine that x4 = 7 wont work.
Now trying x2 = 3 for 7.

10.5.1

The Turnpike ReconstrucVon Problem


Once again, for 6, we have to choose between x4 = 6 and x3 = 4.
x3 = 4 is IMPOSSIBLE because D only has one occurrence of 4
and this choice needs two 4s.
If x4 = 6, then we need 6, 2, 4, and 3. That will work.

10.5.1

The Turnpike ReconstrucVon Problem


The only remaining choice is to assign x3 = 5.
This leaves D empty and we have a soluVon.

10.5.1

The Turnpike ReconstrucVon Problem


Decision tree for the worked turnpike reconstrucVon example

10.5.1

The Turnpike ReconstrucVon Problem


Turnpike reconstrucVon algorithm: driver rouVne (pseudocode)

10.5.1

Вам также может понравиться