Академический Документы
Профессиональный Документы
Культура Документы
Clifford A. Shaffer
Department of Computer Science
Virginia Tech
Copyright © 2008
*Temporarily listed as CS2984, this course replaces CS2606.
Goals of this Course
1. Reinforce the concept that costs and
benefits exist for every data structure.
2. Learn the commonly used data
structures.
– These form a programmer's basic data
structure ``toolkit.'„
3. Understand how to measure the cost of a
data structure or program.
– These techniques also allow you to judge the
merits of new data structures that you or
others might invent.
The Need for Data Structures
Data structures organize data
more efficient programs.
More powerful computers
more complex applications.
More complex applications demand more
calculations.
Complex computing tasks are unlike our
everyday experience.
Organizing Data
Any organization for a collection of records
can be searched, processed in any order,
or modified.
The choice of data structure and algorithm
can make the difference between a
program running in a few seconds or many
days.
Efficiency
A solution is said to be efficient if it solves
the problem within its resource constraints.
– Space
– Time
• The cost of a solution is the amount of
resources that the solution consumes.
Selecting a Data Structure
Select a data structure as follows:
1. Analyze the problem to determine the
basic operations that must be supported.
2. Quantify the resource constraints for
each operation.
3. Select the data structure that best meets
these requirements.
Some Questions to Ask
• Are all data inserted into the data structure
at the beginning, or are insertions
interspersed with other operations?
• Can data be deleted?
• Are all data processed in some well-
defined order, or is random access
allowed?
Costs and Benefits
Each data structure has costs and benefits.
Rarely is one data structure better than
another in all situations.
Any data structure requires:
– space for each data item it stores,
– time to perform each basic operation,
– programming effort.
Costs and Benefits (cont)
Each problem has constraints on available
space and time.
Only after a careful analysis of problem
characteristics can we know the best data
structure for the task.
Bank example:
– Start account: a few minutes
– Transactions: a few seconds
– Close account: overnight
Example 1.2
Problem: Create a database containing
information about cities and towns.
Tasks: Find by name or attribute or
location
• Exact match, range query, spatial query
Resource requirements: Times can be
from a few seconds for simple queries
to a minute or two for complex queries
Scheduling
15
What is the Mechanism?
17
Estimation Techniques
Known as “back of the envelope” or
“back of the napkin” calculation
1. Determine the major parameters that effect the
problem.
2. Derive an equation that relates the parameters
to the problem.
3. Select values for the parameters, and apply
the equation to yield and estimated solution.
18
Estimation Example
How many library bookcases does it
take to store books totaling one
million pages?
Estimate:
– Pages/inch
– Feet/shelf
– Shelves/bookcase
19
Abstract Data Types
Abstract Data Type (ADT): a definition for a
data type solely in terms of a set of values
and a set of operations on that data type.
Each ADT operation is defined by its inputs
and outputs.
Encapsulation: Hide implementation details.
Data Structure
• A data structure is the physical
implementation of an ADT.
– Each operation associated with the ADT is
implemented by one or more subroutines in
the implementation.
• Data structure usually refers to an
organization for data in main memory.
• File structure: an organization for data on
peripheral storage, such as a disk drive.
Metaphors
An ADT manages complexity through
abstraction: metaphor.
– Hierarchies of labels
ADT:
Data Items:
Type
Logical Form
Operations
26
Algorithm Efficiency (cont)
Goal (1) is the concern of Software
Engineering.
Goal (2) is the concern of data structures
and algorithm analysis.
27
How to Measure Efficiency?
1. Empirical comparison (run programs)
2. Asymptotic Algorithm Analysis
Critical resources:
29
Examples (cont)
Example 2: Assignment statement.
Example 3:
sum = 0;
for (i=1; i<=n; i++)
for (j=1; j<n; j++)
sum++;
}
30
Growth Rate Graph
31
Best, Worst, Average Cases
Not all inputs of a given size take the same
time to run.
Sequential search for K in an array of n
integers:
• Begin at first element in array and look at
each element in turn until K is found
Best case:
Worst case:
Average case:
32
Which Analysis to Use?
While average time appears to be the fairest
measure, it may be difficult to determine.
33
Faster Computer or Algorithm?
Suppose we buy a computer 10 times faster.
n: size of input that can be processed in one second
on old computer (in 1000 computational units)
n’: size of input that can be processed in one second
on new computer (in 10,000 computational units)
34
Asymptotic Analysis: Big-oh
Definition: For T(n) a non-negatively valued
function, T(n) is in the set O(f(n)) if there
exist two positive constants c and n0
such that T(n) <= cf(n) for all n > n0.
Use: The algorithm is in O(n2) in [best, average,
worst] case.
Meaning: For all data sets big enough (i.e., n>n0),
the algorithm always executes in less than
cf(n) steps in [best, average, worst] case.
35
Big-oh Notation (cont)
Big-oh notation indicates an upper bound.
36
Big-Oh Examples
Example 1: Finding value X in an array
(average cost).
Then T(n) = csn/2.
For all values of n > 1, csn/2 <= csn.
Therefore, the definition is satisfied for
f(n)=n, n0 = 1, and c = cs.
Hence, T(n) is in O(n).
37
Big-Oh Examples
Example 2: Suppose T(n) = c1n2 + c2n, where c1
and c2 are positive.
c1n2 + c2n <= c1n2 + c2n2 <= (c1 + c2)n2 for all n > 1.
Then T(n) <= cn2 whenever n > n0, for c = c1 + c2
and n0 = 1.
Therefore, T(n) is in O(n2) by definition.
38
A Common Misunderstanding
“The best case for my algorithm is n=1
because that is the fastest.” WRONG!
Big-oh refers to a growth rate as n grows to
.
Best case is defined for the input of size n
that is cheapest among all inputs of size
n.
39
Big-Omega
Definition: For T(n) a non-negatively valued
function, T(n) is in the set (g(n)) if there
exist two positive constants c and n0
such that T(n) >= cg(n) for all n > n0.
Lower bound.
40
Big-Omega Example
41
Theta Notation
When big-Oh and coincide, we indicate
this by using (big-Theta) notation.
42
A Common Misunderstanding
Confusing worst case with upper bound.
43
Simplifying Rules
1. If f(n) is in O(g(n)) and g(n) is in O(h(n)),
then f(n) is in O(h(n)).
2. If f(n) is in O(kg(n)) for some constant k >
0, then f(n) is in O(g(n)).
3. If f1(n) is in O(g1(n)) and f2(n) is in
O(g2(n)), then (f1 + f2)(n) is in
O(max(g1(n), g2(n))).
4. If f1(n) is in O(g1(n)) and f2(n) is in
O(g2(n)) then f1(n)f2(n) is in O(g1(n)g2(n)).
44
Time Complexity Examples (1)
Example 3.9: a = b;
This assignment takes constant time, so it is
(1).
Example 3.10:
sum = 0;
for (i=1; i<=n; i++)
sum += n;
45
Time Complexity Examples (2)
Example 3.11:
sum = 0;
for (j=1; j<=n; j++)
for (i=1; i<=j; i++)
sum++;
for (k=0; k<n; k++)
A[k] = k;
46
Time Complexity Examples (3)
Example 3.12:
sum1 = 0;
for (i=1; i<=n; i++)
for (j=1; j<=n; j++)
sum1++;
sum2 = 0;
for (i=1; i<=n; i++)
for (j=1; j<=i; j++)
sum2++;
47
Time Complexity Examples (4)
Example 3.13:
sum1 = 0;
for (k=1; k<=n; k*=2)
for (j=1; j<=n; j++)
sum1++;
sum2 = 0;
for (k=1; k<=n; k*=2)
for (j=1; j<=k; j++)
sum2++;
48
Binary Search
49
Binary Search
// Return the position of an element in "A"
// with value "K". If "K" is not in "A",
// return A.length.
static int binary(int[] A, int K) {
int l = -1; // Set l and r
int r = A.length; // beyond array bounds
while (l+1 != r) { // Stop when l, r meet
int i = (l+r)/2; // Check middle
if (K < A[i]) r = i; // In left half
if (K == A[i]) return i; // Found it
if (K > A[i]) l = i; // In right half
}
return A.length; // Search value not in A
}
50
Other Control Statements
while loop: Analyze like a for loop.
51
Problems
• Problem: a task to be performed.
– Best thought of as inputs and matching
outputs.
– Problem definition should include constraints
on the resources that may be consumed by
any acceptable solution.
52
Problems (cont)
• Problems mathematical functions
– A function is a matching between inputs (the
domain) and outputs (the range).
– An input to a function may be single number,
or a collection of information.
– The values making up an input are called the
parameters of the function.
– A particular input must always result in the
same output every time the function is
computed.
53
Algorithms and Programs
Algorithm: a method or a process followed to
solve a problem.
– A recipe.
54
Analyzing Problems
Upper bound: Upper bound of best known
algorithm.
55
Space/Time Tradeoff Principle
One can often reduce time if one is willing to
sacrifice space, or vice versa.
• Encoding or packing information
Boolean flags
• Table lookup
Factorials
Time: Algorithm
Space: Data Structure
59
Lists
A list is a finite, ordered sequence of data
items.
60
List Implementation Concepts
Our list implementation will support the
concept of a current position.
61
List ADT
public interface List<E> {
public void clear();
public void insert(E item);
public void append(E item);
public E remove();
public void moveToStart();
public void moveToEnd();
public void prev();
public void next();
public int length();
public int currPos();
public void moveToPos(int pos);
public E getValue();
} 62
List ADT Examples
List: <12 | 32, 15>
L.insert(99);
65
Array-Based List Class (1)
class AList<E> implements List<E> {
private static final int defaultSize = 10;
// Constructors
AList() { this(defaultSize); }
@SuppressWarnings("unchecked")
AList(int size) {
maxSize = size;
listSize = curr = 0;
listArray = (E[])new Object[size];
66
}
Array-Based List Class (2)
public void clear()
{ listSize = curr = 0; }
public void moveToStart() { curr = 0; }
public void moveToEnd() { curr = listSize; }
public void prev() { if (curr != 0) curr--; }
public void next()
{ if (curr < listSize) curr++; }
public int length() { return listSize; }
public int currPos() { return curr; }
67
Array-Based List Class (3)
public void moveToPos(int pos) {
assert (pos>=0) && (pos<=listSize) :
"Position out of range";
curr = pos;
}
public E getValue() {
assert (curr >= 0) && (curr < listSize) :
"No current element";
return listArray[curr];
}
68
Insert
// Insert "it" at current position */
public void insert(E it) {
assert listSize < maxSize :
"List capacity exceeded";
for (int i=listSize; i>curr; i--)
listArray[i] = listArray[i-1];
listArray[curr] = it;
listSize++;
}
69
Append
public void append(E it) { // Append "it"
assert listSize < maxSize :
"List capacity exceeded";
listArray[listSize++] = it;
}
70
Remove
// Remove and return the current element.
public E remove() {
assert (curr >= 0) && (curr < listSize) :
"No current element";
E it = listArray[curr];
for(int i=curr; i<listSize-1; i++)
listArray[i] = listArray[i+1];
listSize--;
return it;
}
71
Link Class
Dynamic allocation of new list elements.
class Link<E> {
private E element;
private Link<E> next;
// Constructors
Link(E it, Link<E> nextval)
{ element = it; next = nextval; }
Link(Link<E> nextval) { next = nextval; }
73
Linked List Position (2)
74
Linked List Class (1)
class LList<E> implements List<E> {
private Link<E> head;
private Link<E> tail;
protected Link<E> curr;
int cnt;
//Constructors
LList(int size) { this(); }
LList() {
curr = tail = head = new Link<E>(null);
cnt = 0;
}
75
Linked List Class (2)
public void clear() {
head.setNext(null);
curr = tail = head = new Link<E>(null);
cnt = 0;
}
public void moveToStart() { curr = head; }
public void moveToEnd() { curr = tail; }
public int length() { return cnt; }
public void next() {
if (curr != tail) { curr = curr.next(); }
}
public E getValue() {
assert curr.next() != null :
"Nothing to get";
return curr.next().element();
} 76
Insertion
77
Insert/Append
// Insert "it" at current position
public void insert(E it) {
curr.setNext(new Link<E>(it, curr.next()));
if (tail == curr) tail = curr.next();
cnt++;
}
public void append(E it) {
tail = tail.setNext(new Link<E>(it, null));
cnt++;
}
78
Removal
79
Remove
/** Remove and return current element */
public E remove() {
if (curr.next() == null) return null;
E it = curr.next().element();
if (tail == curr.next()) tail = curr;
curr.setNext(curr.next().next());
cnt--;
return it;
}
80
Prev
/** Move curr one step left;
no change if already at front */
public void prev() {
if (curr == head) return;
Link<E> temp = head;
// March down list until we find the
// previous element
while (temp.next() != curr)
temp = temp.next();
curr = temp;
}
81
Get/Set Position
/** Return position of the current element */
public int currPos() {
Link<E> temp = head;
int i;
for (i=0; curr != temp; i++)
temp = temp.next();
return i;
}
/** Move down list to "pos" position */
public void moveToPos(int pos) {
assert (pos>=0) && (pos<=cnt) :
"Position out of range";
curr = head;
for(int i=0; i<pos; i++)
curr = curr.next();
} 82
Comparison of Implementations
Array-Based Lists:
• Insertion and deletion are (n).
• Prev and direct access are (1).
• Array must be allocated in advance.
• No overhead if all array positions are full.
Linked Lists:
• Insertion and deletion are (1).
• Prev and direct access are (n).
• Space grows with number of elements.
• Every element requires overhead.
83
Space Comparison
“Break-even” point:
DE = n(P + E);
n = DE
P+E
86
Link Class Extensions
static Link freelist = null;
static <E> Link<E> get(E it, Link<E> nextval) {
if (freelist == null)
return new Link<E>(it, nextval);
Link<E> temp = freelist;
freelist = freelist.next();
temp.setElement(it);
temp.setNext(nextval);
return temp;
}
void release() { // Return to freelist
element = null;
next = freelist;
freelist = this;
} 87
Using Freelist
public void insert(E it) {
curr.setNext(Link.get(it, curr.next()));
if (tail == curr) tail = curr.next();
cnt++;
}
public E remove() {
if (curr.next() == null) return null;
E it = curr.next().element();
if (tail == curr.next()) tail = curr;
Link<E> tempptr = curr.next();
curr.setNext(curr.next().next());
tempptr.release();
cnt--;
return it;
}
88
Doubly Linked Lists
class DLink<E> {
private E element;
private DLink<E> next;
private DLink<E> prev;
DLink(E it, DLink<E> n, DLink<E> p)
{ element = it; next = n; prev = p; }
DLink(DLink<E> n, DLink<E> p)
{ next = n; prev = p; }
DLink<E> next() { return next; }
DLink<E> setNext(DLink<E> nextval)
{ return next = nextval; }
DLink<E> prev() { return prev; }
DLink<E> setPrev(DLink<E> prevval)
{ return prev = prevval; }
E element() { return element; }
E setElement(E it) { return element = it;
89 }
}
Doubly Linked Lists
90
Doubly Linked Insert
91
Doubly Linked Insert
public void insert(E it) {
curr.setNext(new DLink<E>(it, curr.next(),
curr));
if (curr.next().next() != null)
curr.next().next().setPrev(curr.next());
if (tail == curr) tail = curr.next();
cnt++;
}
92
Doubly Linked Remove
93
Doubly Linked Remove
public E remove() {
if (curr.next() == null) return null;
E it = curr.next().element();
if (curr.next().next() != null)
curr.next().next().setPrev(curr);
else tail = curr;
curr.setNext(curr.next().next());
cnt--;
return it;
}
94
Stacks
LIFO: Last In, First Out.
95
Stack ADT
public interface Stack<E> {
/** Reinitialize the stack. */
public void clear();
Issues:
• Which end is the top?
• Where does “top” point to?
• What are the costs of the operations?
97
Linked Stack
class LStack<E> implements Stack<E> {
private Link<E> top;
private int size;
98
Queues
FIFO: First in, First Out
99
Queue Implementation (1)
100
Queue Implementation (2)
101
Dictionary
Often want to insert records, delete records,
search for records.
Required concepts:
• Search key: Describe what we are looking
for
• Key comparison
– Equality: sequential search
– Relative order: sorting
102
Records and Keys
• Problem: How do we extract the key
from a record?
• Records can have multiple keys.
• Fundamentally, the key is not a property
of the record, but of the context.
• Solution: We will explicitly store the key
with the record.
103
Dictionary ADT
public interface Dictionary<K, E> {
104
Payroll Class
// Simple payroll entry: ID, name, address
class Payroll {
private Integer ID;
private String name;
private String address;
Payroll(int inID, String inname, String inaddr)
{
ID = inID;
name = inname;
address = inaddr;
}
public Integer getID() { return ID; }
public String getname() { return name; }
public String getaddr() { return address; }
} 105
Using Dictionary
// IDdict organizes Payroll records by ID
Dictionary<Integer, Payroll> IDdict =
new UALdictionary<Integer, Payroll>();
// namedict organizes Payroll records by name
Dictionary<String, Payroll> namedict =
new UALdictionary<String, Payroll>();
Payroll foo1 = new Payroll(5, "Joe", "Anytown");
Payroll foo2 = new Payroll(10, "John", "Mytown");
IDdict.insert(foo1.getID(), foo1);
IDdict.insert(foo2.getID(), foo2);
namedict.insert(foo1.getname(), foo1);
namedict.insert(foo2.getname(), foo2);
Payroll findfoo1 = IDdict.find(5);
Payroll findfoo2 = namedict.find("John");
106
Unsorted List Dictionary
class UALdictionary<K, E>
implements Dictionary<K, E> {
private static final int defaultSize = 10;
private AList<KVpair<K,E>> list;
// Constructors
UALdictionary() { this(defaultSize); }
UALdictionary(int sz)
{ list = new AList<KVpair<K, E>>(sz); }
public void clear() { list.clear(); }
/** Insert an element: append to list */
public void insert(K k, E e) {
KVpair<K,E> temp = new KVpair<K,E>(k, e);
list.append(temp); 107
}
Sorted vs. Unsorted List
Dictionaries
• If list were sorted
– Could use binary search to speed search
– Would need to insert in order, slowing
insert
• Which is better?
– If lots of searches, sorted list is good
– If inserts are as likely as searches, then
sorting is no benefit.
108
Binary Trees
A binary tree is made up of a finite set of
nodes that is either empty or consists of a
node called the root together with two
binary trees, called the left and right
subtrees, which are disjoint from each
other and from the root.
109
Binary Tree Example
Notation: Node,
children, edge,
parent, ancestor,
descendant, path,
depth, height, level,
leaf node, internal
node, subtree.
110
Full and Complete Binary Trees
Full binary tree: Each node is either a leaf or
internal node with exactly two non-empty children.
Complete binary tree: If the height of the tree is d,
then all leaves except possibly level d are
completely full. The bottom level has all nodes to
the left side.
111
Full Binary Tree Theorem (1)
Theorem: The number of leaves in a non-empty
full binary tree is one more than the number of
internal nodes.
Proof (by Mathematical Induction):
Base case: A full binary tree with 1 internal node must
have two leaf nodes.
Induction Hypothesis: Assume any full binary tree T
containing n-1 internal nodes has n leaves.
112
Full Binary Tree Theorem (2)
Induction Step: Given tree T with n internal
nodes, pick internal node I with two leaf children.
Remove I‟s children, call resulting tree T’.
By induction hypothesis, T’ is a full binary tree with
n leaves.
Restore I‟s two children. The number of internal
nodes has now gone up by 1 to reach n. The
number of leaves has also gone up by 1.
113
Full Binary Tree Corollary
Theorem: The number of null pointers in a
non-empty tree is one more than the
number of nodes in the tree.
114
Binary Tree Node Class
/** ADT for binary tree nodes */
public interface BinNode<E> {
/** Return and set the element value */
public E element();
public E setElement(E v);
/** Return the left child */
public BinNode<E> left();
/** Return the right child */
public BinNode<E> right();
/** Return true if this is a leaf node */
public boolean isLeaf();
}
115
Traversals (1)
Any process for visiting the nodes in
some order is called a traversal.
Any traversal that lists every node in
the tree exactly once is called an
enumeration of the tree‟s nodes.
116
Traversals (2)
• Preorder traversal: Visit each node before
visiting its children.
• Postorder traversal: Visit each node after
visiting its children.
• Inorder traversal: Visit the left subtree,
then the node, then the right subtree.
117
Traversals (3)
/** @param rt The root of the subtree */
void preorder(BinNode rt)
{
if (rt == null) return; // Empty subtree
visit(rt);
preorder(rt.left());
preorder(rt.right());
}
void preorder2(BinNode rt) // Not so good
{
visit(rt);
if (rt.left() != null)
preorder(rt.left());
if (rt.right() != null)
preorder(rt.right());
} 118
Recursion Examples
int count(BinNode rt) {
if (rt == null) return 0;
return 1 + count(rt.left()) +
count(rt.right());
}
boolean checkBST(BSTNode<Integer,Integer> root,
Integer low, Integer high) {
if (root == null) return true;
Integer rootkey = root.key();
if ((rootkey < low) || (rootkey > high))
return false; // Out of range
if (!checkBST(root.left(), low, rootkey))
return false; // Left side failed
return checkBST(root.right(), rootkey, high);
}
119
Binary Tree Implementation (1)
120
Binary Tree Implementation (2)
121
Inheritance (1)
public interface VarBinNode {
public boolean isLeaf();
}
class VarLeafNode implements VarBinNode {
private String operand;
public VarLeafNode(String val)
{ operand = val; }
public boolean isLeaf() { return true; }
public String value() { return operand; }
};
122
Inheritance (2)
/** Internal node */
class VarIntlNode implements VarBinNode {
private VarBinNode left;
private VarBinNode right;
private Character operator;
public VarIntlNode(Character op,
VarBinNode l, VarBinNode r)
{ operator = op; left = l; right = r; }
public boolean isLeaf() { return false; }
public VarBinNode leftchild() { return left; }
public VarBinNode rightchild(){ return right; }
public Character value() { return operator; }
}
123
Inheritance (3)
/** Preorder traversal */
public static void traverse(VarBinNode rt) {
if (rt == null) return;
if (rt.isLeaf())
Visit.VisitLeafNode(((VarLeafNode)rt).value());
else {
Visit.VisitInternalNode(
((VarIntlNode)rt).value());
traverse(((VarIntlNode)rt).leftchild());
traverse(((VarIntlNode)rt).rightchild());
}
}
124
Composition (1)
public interface VarBinNode {
public boolean isLeaf();
public void traverse();
}
class VarLeafNode implements VarBinNode {
private String operand;
public VarLeafNode(String val)
{ operand = val; }
public boolean isLeaf() { return true; }
public String value() { return operand; }
public void traverse()
{ Visit.VisitLeafNode(operand); }
}
125
Composition (2)
class VarIntlNode implements VarBinNode {
private VarBinNode left;
private VarBinNode right;
private Character operator;
public VarIntlNode(Character op,
VarBinNode l, VarBinNode r)
{ operator = op; left = l; right = r; }
public boolean isLeaf() { return false; }
public VarBinNode leftchild() { return left; }
public VarBinNode rightchild()
{ return right; }
public Character value() { return operator; }
public void traverse() {
Visit.VisitInternalNode(operator);
if (left != null) left.traverse();
if (right != null) right.traverse();
} 126
}
Composition (3)
/** Preorder traversal */
public static void traverse(VarBinNode rt) {
if (rt != null) rt.traverse();
}
127
Space Overhead (1)
From the Full Binary Tree Theorem:
• Half of the pointers are null.
If leaves store only data, then overhead
depends on whether the tree is full.
Ex: Full tree, all nodes the same, with two pointers
to children and one to element:
• Total space required is (3p + d)n
• Overhead: 3pn
• If p = d, this means 3p/(3p + d) = 3/4 overhead.
128
Space Overhead (2)
Eliminate pointers from the leaf nodes:
n/2(2p) + np 2p
=
n/2(2p) + np +dn 2p + d
This is 2/3 if p = d.
(2p +p)/(2p + d + p) if data only at leaves
3/4 overhead.
Note that some method is needed to
distinguish leaves from internal nodes.
129
Array Implementation (1)
Position 0 1 2 3 4 5 6 7 8 9 10 11
Parent -- 0 0 1 1 2 2 3 3 4 4 5
Left Child 1 3 5 7 9 11 -- -- -- -- -- --
Right Child 2 4 6 8 10 -- -- -- -- -- -- --
Left Sibling -- -- 1 -- 3 -- 5 -- 7 -- 9 --
Right Sibling -- 2 -- 4 -- 6 -- 8 -- 10 -- --
130
Array Implementation (1)
Parent (r) =
Leftchild(r) =
Rightchild(r) =
Leftsibling(r) =
Rightsibling(r) =
131
Binary Search Trees
BST Property: All elements stored in the left
subtree of a node with value K have values < K.
All elements stored in the right subtree of a node
with value K have values >= K.
132
BSTNode (1)
class BSTNode<K,E> implements BinNode<E> {
private K key;
private E element;
private BSTNode<K,E> left;
private BSTNode<K,E> right;
public BSTNode() {left = right = null; }
public BSTNode(K k, E val)
{ left = right = null; key = k; element = val; }
public BSTNode(K k, E val,
BSTNode<K,E> l, BSTNode<K,E> r)
{ left = l; right = r; key = k; element = val; }
public K key() { return key; }
public K setKey(K k) { return key = k; }
public E element() { return element; }
public E setElement(E v) { return element = v; }
133
BSTNode (2)
public BSTNode<K,E> left() { return left; }
public BSTNode<K,E> setLeft(BSTNode<K,E> p)
{ return left = p; }
public BSTNode<K,E> right() { return right; }
public BSTNode<K,E> setRight(BSTNode<K,E> p)
{ return right = p; }
public boolean isLeaf()
{ return (left == null) && (right == null); }
}
134
BST (1)
/** BST implementation for Dictionary ADT */
class BST<K extends Comparable<? super K>, E>
implements Dictionary<K, E> {
private BSTNode<K,E> root; // Root of BST
int nodecount; // Size of BST
/** Constructor */
BST() { root = null; nodecount = 0; }
/** Reinitialize tree */
public void clear()
{ root = null; nodecount = 0; }
/** Insert a record into the tree.
@param k Key value of the record.
@param e The record to insert. */
public void insert(K k, E e) {
root = inserthelp(root, k, e);
nodecount++; 135
}
BST (2)
/** Remove a record from the tree.
@param k Key value of record to remove.
@return Record removed, or null if
there is none. */
public E remove(K k) {
E temp = findhelp(root, k); // find it
if (temp != null) {
root = removehelp(root, k); // remove it
nodecount--;
}
return temp;
}
136
BST (3)
/** Remove/return root node from dictionary.
@return The record removed, null if empty. */
public E removeAny() {
if (root != null) {
E temp = root.element();
root = removehelp(root, root.key());
nodecount--;
return temp;
}
else return null;
}
/** @return Record with key k, null if none.
@param k The key value to find. */
public E find(K k)
{ return findhelp(root, k); }
/** @return Number of records in dictionary. */
public int size() { return nodecount; } 137
}
BST Search
private E findhelp(BSTNode<K,E> rt, K k) {
if (rt == null) return null;
if (rt.key().compareTo(k) > 0)
return findhelp(rt.left(), k);
else if (rt.key().compareTo(k) == 0)
return rt.element();
else return findhelp(rt.right(), k);
}
138
BST Insert (1)
139
BST Insert (2)
private BSTNode<K,E>
inserthelp(BSTNode<K,E> rt, K k, E e) {
if (rt == null) return new BSTNode<K,E>(k, e);
if (rt.key().compareTo(k) > 0)
rt.setLeft(inserthelp(rt.left(), k, e));
else
rt.setRight(inserthelp(rt.right(), k, e));
return rt;
}
140
Get/Remove Minimum Value
private BSTNode<K,E>
getmin(BSTNode<K,E> rt) {
if (rt.left() == null)
return rt;
else return getmin(rt.left());
}
private BSTNode<K,E>
deletemin(BSTNode<K,E> rt) {
if (rt.left() == null)
return rt.right();
else {
rt.setLeft(deletemin(rt.left()));
return rt;
}
}
141
BST Remove (1)
142
BST Remove (2)
/** Remove a node with key value k
@return The tree with the node removed */
private BSTNode<K,E>
removehelp(BSTNode<K,E> rt, K k) {
if (rt == null) return null;
if (rt.key().compareTo(k) > 0)
rt.setLeft(removehelp(rt.left(), k));
else if (rt.key().compareTo(k) < 0)
rt.setRight(removehelp(rt.right(), k));
143
BST Remove (3)
else { // Found it, remove it
if (rt.left() == null)
return rt.right();
else if (rt.right() == null)
return rt.left();
else { // Two children
BSTNode<K,E> temp = getmin(rt.right());
rt.setElement(temp.element());
rt.setKey(temp.key());
rt.setRight(deletemin(rt.right()));
}
}
return rt;
}
144
Time Complexity of BST Operations
Find: O(d)
Insert: O(d)
Delete: O(d)
145
Priority Queues (1)
Problem: We want a data structure that stores
records as they come (insert), but on request,
releases the record with the greatest value
(removemax)
146
Priority Queues (2)
Possible Solutions:
- insert appends to an array or a linked list ( O(1) )
and then removemax determines the maximum
by scanning the list ( O(n) )
- A linked list is used and is in decreasing order;
insert places an element in its correct position (
O(n) ) and removemax simply removes the head
of the list
( O(1) ).
- Use a heap – both insert and removemax are
O( log n ) operations
147
Heaps
Heap: Complete binary tree with the heap
property:
• Min-heap: All values less than child values.
• Max-heap: All values greater than child values.
148
Max Heap Example
88 85 83 72 73 42 57 6 48 60
149
Max Heap Implementation (1)
public class MaxHeap<K extends Comparable<? super K>, E> {
private KVpair<K,E>[] Heap; // Pointer to heap array
private int size; // Maximum size of heap
private int n; // # of things in heap
public MaxHeap(KVpair<K,E>[] h, int num, int max)
{ Heap = h; n = num; size = max; buildheap(); }
public int heapsize() { return n; }
public boolean isLeaf(int pos) // Is pos a leaf position?
{ return (pos >= n/2) && (pos < n); }
public int leftchild(int pos) { // Leftchild position
assert pos < n/2 : "Position has no left child";
return 2*pos + 1;
}
public int rightchild(int pos) { // Rightchild position
assert pos < (n-1)/2 : "Position has no right child";
return 2*pos + 2;
}
public int parent(int pos) {
assert pos > 0 : "Position has no parent"; 150
return (pos-1)/2;
}
Sift Down
public void buildheap() // Heapify contents
{ for (int i=n/2-1; i>=0; i--) siftdown(i); }
private void siftdown(int pos) {
assert (pos >= 0) && (pos < n) :
"Illegal heap position";
while (!isLeaf(pos)) {
int j = leftchild(pos);
if ((j<(n-1)) &&
(Heap[j].key().compareTo(Heap[j+1].key())
< 0))
j++; // index of child w/ greater value
if (Heap[pos].key().compareTo(Heap[j].key())
>= 0)
return;
DSutil.swap(Heap, pos, j);
pos = j; // Move down
}
} 151
RemoveMax, Insert
public KVpair<K,E> removemax() {
assert n > 0 : "Removing from empty heap";
DSutil.swap(Heap, 0, --n);
if (n != 0) siftdown(0);
return Heap[n];
}
public void insert(KVpair<K,E> val) {
assert n < size : "Heap is full";
int curr = n++;
Heap[curr] = val;
// Siftup until curr parent's key > curr key
while ((curr != 0) &&
(Heap[curr].key().
compareTo(Heap[parent(curr)].key())
> 0)) {
DSutil.swap(Heap, curr, parent(curr));
curr = parent(curr);
} 152
}
Heap Building Analysis
• Insert into the heap one value at a time:
– Push each new value down the tree from the root
to where it belongs
– S log i = (n log n)
• Starting with full array, work from bottom up
– Since nodes below form a heap, just need to push
current node down (at worst, go to bottom)
– Most nodes are at the bottom, so not far to go
– S (i-1) n/2i = (n)
153
Huffman Coding Trees
ASCII codes: 8 bits per character.
• Fixed-length coding.
154
Huffman Tree Construction (1)
155
Huffman Tree Construction (2)
156
Assigning Codes
Letter Freq Code Bits
C 32
D 42
E 120
M 24
K 7
L 42
U 37
Z 2
157
Coding and Decoding
A set of codes is said to meet the prefix
property if no code in the set is the prefix
of another.
Decode 1011001110111101:
158
General Trees
159
General Tree Node
interface GTNode<E> {
public E value();
public boolean isLeaf();
public GTNode<E> parent();
public GTNode<E> leftmostChild();
public GTNode<E> rightSibling();
public void setValue(E value);
public void setParent(GTNode<E> par);
public void insertFirst(GTNode<E> n);
public void insertNext(GTNode<E> n);
public void removeFirst();
public void removeNext();
}
160
General Tree Traversal
/** Preorder traversal for general trees */
static <E> void preorder(GTNode<E> rt) {
PrintNode(rt);
if (!rt.isLeaf()) {
GTNode<E> temp = rt.leftmostChild();
while (temp != null) {
preorder(temp);
temp = temp.rightSibling();
}
}
}
161
Parent Pointer Implementation
Equivalence Class Problem
The parent pointer representation is good for
answering:
– Are two elements in the same tree?
/** Determine if nodes in different trees */
public boolean differ(int a, int b) {
Integer root1 = FIND(array[a]);
Integer root2 = FIND(array[b]);
return root1 != root2;
}
Union/Find
/** Merge two subtrees */
public void UNION(int a, int b) {
Integer root1 = FIND(a); // Find a’s root
Integer root2 = FIND(b); // Find b’s root
if (root1 != root2) array[root2] = root1;
}
168
Leftmost Child/Right Sibling (1)
169
Leftmost Child/Right Sibling (2)
170
Linked Implementations (1)
171
Linked Implementations (2)
172
Efficient Linked Implementation
173
Sequential Implementations (1)
List node values in the order they would be visited
by a preorder traversal.
Saves space, but allows only sequential access.
Need to retain tree structure for reconstruction.
Example: For binary trees, us a symbol to mark
null links.
AB/D//CEG///FH//I//
174
Sequential Implementations (2)
RAC)D)E))BF)))
175
Sorting
Each record contains a field called the key.
– Linear order: comparison.
Measures of cost:
– Comparisons
– Swaps
Insertion Sort (1)
Insertion Sort (2)
static <E extends Comparable<? super E>>
void Sort(E[] A) {
for (int i=1; i<A.length; i)
for (int j=i;
(j>0) && (A[j].compareTo(A[j-1])<0);
j--)
DSutil.swap(A, j, j-1);
}
Best Case:
Worst Case:
Average Case:
Bubble Sort (1)
Bubble Sort (2)
static <E extends Comparable<? super E>>
void Sort(E[] A) {
for (int i=0; i<A.length-1; i++)
for (int j=A.length-1; j>i; j--)
if ((A[j].compareTo(A[j-1]) < 0))
DSutil.swap(A, j, j-1);
}
Best Case:
Worst Case:
Average Case:
Selection Sort (1)
Selection Sort (2)
static <E extends Comparable<? super E>>
void Sort(E[] A) {
for (int i=0; i<A.length-1; i++) {
int lowindex = i;
for (int j=A.length-1; j>i; j--)
if (A[j].compareTo(A[lowindex]) < 0)
lowindex = j;
DSutil.swap(A, i, lowindex);
}
}
Best Case:
Worst Case:
Average Case:
Pointer Swapping
Summary
Insertion Bubble Selection
Comparisons:
Best Case (n) (n2) (n2)
Average Case (n2) (n2) (n2)
Worst Case (n2) (n2) (n2)
Swaps
Best Case 0 0 (n)
Average Case (n2) (n2) (n)
Worst Case (n2) (n2) (n)
Exchange Sorting
All of the sorts so far rely on exchanges of
adjacent records.
Ways to generalize:
– Make each bin the head of a list.
– Allow more keys than records.
Binsort (2)
static void binsort(Integer A[]) {
List<Integer>[] B =
(LList<Integer>[])new LList[MaxKey];
Integer item;
for (int i=0; i<MaxKey; i++)
B[i] = new LList<Integer>();
for (int i=0; i<A.length; i++)
B[A[i]].append(A[i]);
for (int i=0; i<MaxKey; i++)
for (B[i].moveToStart();
(item = B[i].getValue()) != null;
B[i].next())
output(item);
}
Cost:
Radix Sort (1)
Radix Sort (2)
static void radix(Integer[] A, Integer[] B,
int k, int r, int[] count) {
int i, j, rtok;
for (i=0, rtok=1; i<k; i++, rtok*=r) {
for (j=0; j<r; j++) count[j] = 0;
// Count # of recs for each bin on this pass
for (j=0; j<A.length; j++)
count[(A[j]/rtok)%r]++;
// count[j] is index in B for last slot of j
for (j=1; j<r; j++)
count[j] = count[j-1] + count[j];
for (j=A.length-1; j>=0; j--)
B[--count[(A[j]/rtok)%r]] = A[j];
for (j=0; j<A.length; j++) A[j] = B[j];
}
}
Radix Sort Example
Radix Sort Cost
Cost: (nk + rk)
228
Design Issues
Disadvantage of message passing:
• Messages are copied and passed back and
forth.
Disadvantages of buffer passing:
• The user is given access to system memory (the
buffer itself)
• The user must explicitly tell the buffer pool when
buffer contents have been modified, so that
modified data can be rewritten to disk when the
buffer is flushed.
• The pointer might become stale when the
bufferpool replaces the contents of a buffer.
Some Goals
• Be able to avoid reading data when the
block contents will be replaced.
• Be able to support multiple users
accessing a buffer, and indpendantly
releasing a buffer.
• Don‟t make an active buffer stale.
230
Improved Interface
public interface BufferPoolADT {
Buffer acquireBuffer(int block);
}
close()
read(byte[] b)
write(byte[] b)
seek(long pos)
External Sorting
Problem: Sorting data sets too large to fit
into main memory.
– Assume data are stored on disk drive.
• Algorithm:
– Check every k'th element (L[k], L[2k], ...).
– If K is greater, then go on.
– If K is less, then use linear search on the k
elements.
min n
1 k n
k 1
k
Jump Search Analysis (cont)
Take the derivative and solve for T'(x) = 0
to find the minimum.
Roughly 2 n
Lessons
We want to balance the work done while
selecting a sublist with the work done while
searching a sublist.
{
pi
1 / 2 if 1 i n 1
1/ 2
i
n 1
if i n
n
Cn (i / 2 ) 2.
i
i 1
Zipf Distributions
Applications:
– Distribution for frequency of word usage in
natural languages.
– Distribution for populations of cities, etc.
n
Cn i / i Η n n / H n n / log e n.
i 1
80/20 rule:
– 80% of accesses are to 20% of the records.
– For distributions following 80/20 rule,
Cn 0.122n.
Self-Organizing Lists
Self-organizing lists modify the order of
records within the list based on the actual
pattern of record accesses.
277
Files and Indexing
Entry sequenced file: Order records by time
of insertion.
– Search with sequential search
278
Keys and Indexing
Primary Key: A unique identifier for records.
May be inconvenient for search.
Secondary Key: An alternate search key,
often not unique for each record. Often
used for search key.
279
Linear Indexing (1)
Linear index: Index file organized as a
simple sequence of key/record pointer
pairs with key values are in sorted order.
Linear indexing is good for searching
variable-length records.
280
Linear Indexing (2)
If the index is too large to fit in main
memory, a second-level index might be
used.
281
Tree Indexing (1)
Linear index is poor for insertion/deletion.
282
Tree Indexing (2)
Difficulties when storing tree
index on disk:
– Tree must be balanced.
– Each path from root to leaf
should cover few disk pages.
283
2-3 Tree
A 2-3 Tree has the following properties:
1. A node contains one or two keys
2. Every internal node has either two children
(if it contains one key) or three children (if it
contains two keys).
3. All leaves are at the same level in the tree,
so the tree is always height balanced.
284
2-3 Tree Example
The advantage of the 2-3 Tree over the BST
is that it can be updated at low cost.
285
2-3 Tree Insertion (1)
286
2-3 Tree Insertion (2)
287
2-3 Tree Insertion (3)
288
B-Trees (1)
The B-Tree is an extension of the 2-3 Tree.
289
B-Trees (2)
1. B-Trees are always balanced.
2. B-Trees keep similar-valued records
together on a disk page, which takes
advantage of locality of reference.
3. B-Trees guarantee that every node in the
tree will be full at least to a certain
minimum percentage. This improves
space efficiency while reducing the
typical number of disk fetches necessary
during a search or update operation.
290
B-Tree Definition
A B-Tree of order m has these properties:
– The root is either a leaf or has two children.
– Each node, except for the root and the
leaves, has between m/2 and m children.
– All leaves are at the same level in the tree,
so the tree is always height balanced.
292
+
B -Trees
The most commonly implemented form of the B-
Tree is the B+-Tree.
Internal nodes of the B+-Tree do not store record --
only key values to guild the search.
Leaf nodes store records or pointers to records.
A leaf node may store more or less records than
an internal node stores keys.
293
+
B -Tree Example
294
+
B -Tree Insertion
295
+
B -Tree Deletion (1)
296
+
B -Tree Deletion (2)
297
+
B -Tree Deletion (3)
298
B-Tree Space Analysis (1)
B+-Trees nodes are always at least half full.
Adjacency List:
Graph ADT
interface Graph { // Graph class ADT
public void Init(int n); // Initialize
public int n(); // # of vertices
public int e(); // # of edges
public int first(int v); // First neighbor
public int next(int v, int w); // Neighbor
public void setEdge(int i, int j, int wght);
public void delEdge(int i, int j);
public boolean isEdge(int i, int j);
public int weight(int i, int j);
public void setMark(int v, int val);
public int getMark(int v); // Get v’s Mark
}
Graph Traversals
Some applications require visiting every
vertex in the graph exactly once.
Examples:
– Artificial Intelligence Search
– Shortest paths problems
Graph Traversals (2)
To insure visiting all vertices:
void graphTraverse(Graph G) {
int v;
for (v=0; v<G.n(); v++)
G.setMark(v, UNVISITED); // Initialize
for (v=0; v<G.n(); v++)
if (G.getMark(v) == UNVISITED)
doTraverse(G, v);
}
Depth First Search (1)
// Depth first search
void DFS(Graph G, int v) {
PreVisit(G, v); // Take appropriate action
G.setMark(v, VISITED);
for (int w = G.first(v); w < G.n();
w = G.next(v, w))
if (G.getMark(w) == UNVISITED)
DFS(G, w);
PostVisit(G, v); // Take appropriate action
}
Depth First Search (2)