Вы находитесь на странице: 1из 60

CSE 220: Handout 11

Trees
Chapter 4

1
Trees
 In lists, searching is (n). In stacks and
queues, search is not supported
 In trees, data is organized better so that all
of search, insertion, deletion are ecient
 A tree consists of a set of nodes
1. A tree is either empty or
2. consists of a distinguished node r, called
the root, and zero or more trees T1; T2; : : : Tk ,
called subtrees, disjoint from each other
and r. There is an edge from r to the
roots of these subtrees

 Nodes contain data, edges impose a


hierarchical structure on the data

2
Example: Unix File System

/usr

/alice /bob /charlie

/code /info /courses /mail

/cse220 /cse121

/homeworks /notes

hw1

3
Example: Expression Tree

a+b*c*(d+e)

a *

b *

c +

d e

4
Preliminaries
 If there is an edge from node u to v then u
is the parent of v and v is a child of u
 A tree with n nodes has n 1 edges
 Nodes with same parent are siblings
 Degree of a node: number of its non-empty
children
 A node of degree 0 is a leaf, a node of
positive degree is internal
 A path is a sequence n1; n2; : : : nk of nodes
such that each nj is parent of nj +1. The
length of this path is k 1
 If there is a path from u to v, then u is an
ancestor of v, and v is a descendent of u
5
 Every node is an ancestor and a descendent
of itself
 The depth of a node u in a tree is the length
of the unique path from the root to u
 The height of a node u is the maximum over
lengths of paths from u to a leaf
 Root has depth 0, leaves have height 0
 Height of a tree = height of its root = Depth
of its deepest leaf = Depth of a tree
 Level d: all nodes at the same depth d

6
Implementation
 Obvious strategy: each node has data item
and pointers to all children
 Problem: the number of children can vary
greatly, and may not be known in advance
 Solution: every node contains
{ data item
{ pointer to its rst child
{ pointer to its right sibling
 To access all children of a node u, follow
firstChild pointer of u to a node v, and
then starting from v keep following rightSibling
pointers

7
Sample Representation

B C D

E
F G H I

B C D

F G H I

8
Tree Traversal
 Suppose we want to visit all nodes of a tree.
Di erent choices for the order in which data
items are examined
 Preorder: Root rst, then children
printPreorder ( treeNode u ){
output data at u;
for each child v of u
printPreorder(v); }

 Postorder: Children rst, then root


printPostorder ( treeNode u ){
for each child v of u
printPostorder(v);
output data at u; }

9
Traversal Example

B C D

E
F G H I

Preorder: A B E C D F G H I

Postorder: E B C F G H I D A

10
Binary Trees
 A tree T is a binary tree if all its nodes are
of degree  2
 Every node has two children left and right
(one or both of these can be empty)
 Depth of a binary tree with n nodes can be
as small as log n and as large as n 1
 In a full binary tree, every internal node has
two nonempty children (i.e. degree of each
node is 0 or 2)
 A complete binary tree is obtained by lling
the tree by levels from left to right (if depth
is d then leaves can appear only at depth d
and d 1)

11
Sample binary trees

12
Counting Theorems
 Theorem: The number of leaves in a nonempty
full binary tree T is one more than the
number of internal nodes
 Proof by induction on the number of
internal nodes
 Base case: 0 internal nodes. since T is nonempty,
it has 1 leaf
 Inductive hypothesis: a tree with k internal
nodes has k + 1 leaves
 Inductive step: let T have k + 1 internal
nodes. T must have an internal node u
whose both children are leaves. Remove
children of u and make u a leaf. Let result-
ing tree be T 0. Tree T 0 has k internal nodes,
so k + 1 leaves. Leaves of T = Leaves of T 0
u + children of u. So T has k + 2 leaves.
13
Binary Search Tree
 The most important application of
binary trees is to improve search
 Assume that the data items (also called keys)
are ordered (in our examples we will use
integers)
 A binary tree is a binary search tree if at
every node u
{ the keys in the left subtree of u are smaller
than the key of u
{ the keys in the right subtree of u are
larger than the key of u

14
Building a BST
37 37 37

24 24 42

37 37

24 42 24 42

7 7 40

37

24 42

7 32 40 120

15
Another BST
 Same keys inserted in a di erent order:
120, 42, 7, 2, 32, 37, 24, 40

120
42

2 32

24 37

40

16
Binary Search Tree ADT
class binaryNode {
binaryNode left ;
binaryNode right ;
Object item ; // Objects can be compared
}
class BST {
private binaryNode root; // pointer to root
// sample public methods
BST(); // constructor
void insert (Object x);
void remove (Object x);
bool find (Object x);
Object findMin ();
bool isEmpty();

17
Search Routine
 Simple recursive strategy: compare key u
with the data v at the root
{ base case: null tree: report fail
{ if u = v, success
{ if u < v, search in the left subtree
{ if u > v search in the right subtree
binaryNode find (binaryNode r,
Object x) {
// searches the tree rooted at r
// returns reference to node with x
// returns null if x is not found
if ( r == null ) return null;
else if ( x < r.item )
return find ( r.left, x );
else if ( x == r ) return r;
else retun find ( r.right, x );
}

18
ndMin routine
 Spec: nd the smallest data item in tree
 Property: Desired item is at the deepest
node of the left-most branch
 Recursive routine
binaryNode findMin (binaryNode r) {
if (r == null) return null;
else if (r.left == null) return r;
return findMin (r.left); }

 Nonrecursive routine
binaryNode findMin (binaryNode r) {
if (r != null)
while (r.left != null) r=r.left;
return r; }

19
Insertion
 Similar recursive routine:
{ descend to a leaf comparing key with the
data at nodes
{ create left/right child as appropriate
binaryNode insert ( binaryNode r,
Object x) {
if ( r == null )
r = new binaryNode (x,null,null);
// important: r is a reference
else if ( x < r.item )
insert ( r.left, x );
else if ( x > r.item )
insert ( r.right, x );
// no action taken if x==r.item
return r;
}

20
Removing the Minimum
To remove minimum element from a tree
 Follow left pointers as long as possible
 Arrive at node u with smallest key
 Update left pointer of parent of u to point
to right child of u

37

24 42

32

21
Removing the minimum
binaryNode removeMin(binaryNode r){
// returns reference to node with min key
if (r.left != null)
return removeMin ( r.left );
else { binaryNode temp = r;
r = r.right;
return temp;
}
}

22
Removal Strategy
To remove a key x from a BST
 Search to nd the node u with data x. Let
v be parent of u.
 If u is a leaf then update child-pointer of v
to null
 If u has just one child, then update child-
pointer of v to child of u
 If u has two children:
{ Update data at u to the least value greater
than x
{ Least value greater than x: minimum in
the right subtree of u
{ apply removeMin to right child of u

23
Removal Example
Deleting the keys 32, 7, 37:

37 37

24 42 24 42

7 32 40 120 7 40 120
2 2
40
37 40

24 42 24 42

2 40 120 2 120

24
Some comments
 Cost of each operation is proportional to the
height of the tree
 Cost is about n in the worst case, but log n
in the average case
 Variations possible:
{ Duplicate keys allowed. Then keys in the
right subtree must be  key at a node
{ In databases, leaves point to records (e.g.
all info on a student), and internal nodes
have keys (unique key per record e.g. SSN)

25
Binary Search Trees: Recap
 Data organized so that search is ecient
 Cost of nd, insert, remove is proportional
to the height of tree
 Height can vary between log n and n 1
depending on the order in which keys are
inserted/deleted
 Challenge: can we reorganize the tree during
insertion and deletion so that height stays
small?

26
AVL trees
 Intuition: to keep height small, we would
like both children to have roughly equal
number of nodes
 Requiring same height for both children is
too restrictive
 A binary search tree is an AVL tree if for
every node u, the heights of u's children
di er by at most 1 (assume empty tree to
be of height 1)
 We will see that
{ Height of an AVL tree with n nodes is
log n
{ Insert and remove routines can be
modi ed to maintain AVL property
 Note: a complete tree is an AVL tree
27
28
Height of an AVL tree
 Let N (h) denote the minimum number of
nodes in an AVL tree of height h
 Base cases: N (0) = 1, N (1) = 2
 Inductive case: For h  2, an AVL tree
{ must have a root
{ one of the children has height h 1
{ heights of two children di er by at most
1 (worst case for minimum size tree: one
child has a smaller height)
N (h) = N (h 1) + N (h 2) + 1

 Looks similar to Fibonacci sequence:


grows exponentially
 Thm: Height of an AVL tree with n nodes
is at most 1:44 log n
29
Smallest AVL tree of height 5

30
Insertion
 Insert as in a usual binary search tree
 If this causes height imbalance (i.e. some
node has children whose heights di er by
2), adjust the tree structure
 Adjustments are called rotations, and
require updating a few pointers
 Example: inserting 120, 42, 7

120
42
42
7 120
7

31
LL Rotation (Single Left)
 Trees X , Y , Z have identical height h before
insertion
 Node is inserted into X making its height
h + 1 so node u becomes unbalanced

u v

v u
Z X

X Y Y Z

32
LL Rotation
 LL rotation of (u; v) means
{ Update u.left to v.right
{ Update v.right to u
 After rotation, tree is still a binary search
tree
{ Every key in Y is smaller than u
{ u is larger than v
 After rotation, tree is an AVL tree
{ Height of u= height of X = h+1
 Before rotation, u was parent of v, and after
rotation, v is parent of u

33
Another Sample LL
50

42 72

5 46 61 100

43 58 65

51

50

42 61

5 46 72
58

43 51 65 100

34
RR Rotation (Single Right)
 Symmetric to LL
 Trees X , Y , Z have identical height h before
insertion
 Node is inserted into Z making its height
h + 1 so node u becomes unbalanced
 Update u.right to v.left,
update v.left to u
u v

v u
X Z

Y Z X Y

35
Sample LR Rotation
 Inserting 120, 42, 7, 2, 32, 37
 42 is unbalanced due to insertion in right-
subtree (32) of the left-child (7) of 42
 Rotation involves three nodes

42 32

7 120 7 42

2 32 2 37 120

37

36
General LR Rotation
u

v
A

w
X

Y Z

v u

X Y Z A

37
LR rotation
 Trees X and A have height h. After
insertion, w has height h + 1 making u
unbalanced
 LR rotation of (u; v; w) means
{ update v.right to w.left
{ update u.left to w.right
{ set w.left to v and w.right to u
 Transformation preserves the binary search
tree property
 Resulting tree is AVL (both u and v have
height h + 1)

38
RL Rotation
u

v
X

w
A

Y Z

u v

X Y Z A

39
An RL example
4

2 6

1 3 5 14

7 15

13

2 7

1 3 6 14

5 13 15

40
Are we done?
 Implementation of AVL trees: maintain a
height eld with each node
 Suppose we insert a leaf u in an AVL tree
 Trace the path from leaf u to root: u1; u2; : : : uk
(u1 equals u and uk equals root)
 Either none of the nodes on this path are
unbalanced (we are done) or let ui be the
rst unbalanced node
 Depending on whether ui 1 is left or right
child of ui, and whether ui 2 is left or right
child of ui 1, we have four cases: LL, LR,
RL, RR. In each case apply the correspond-
ing transformation
 Deletion requires similar rotations
41
B-trees
 Another variant of balanced trees
 Used widely in databases
 Preferred when the data structure resides on
the disk (rather than main memory)
 Special case: 2-3 trees

42
2-3 Trees
 Generalization of binary search tree
 Nodes have upto 2 keys and upto 3 children

k1 k1 k2

< k1 > k1 < k1 > k1 > k2


< k2

43
Sample 2-3 Tree
18 33

12 23 30

10 15 20 21 24 31

48

45 47 50 52

44
De nition
 A node contains one or two keys
 An internal node u with a single key k1 has
two children: u.left and u.center. All keys
in subtree u.left are < k1, all keys in subtree
u.center are > k1
 An internal node u with two keys k1 and
k2 has three children: u.left, u.center, and
u.right. All keys in subtree u.left are < k1,
all keys in subtree u.center are > k1 and
< k2, all keys in subtree u.right are > k2
 All leaves are at the same height
(equivalently, all siblings have equal heights)

45
Some Comments
 If N (h) denotes the minimum number of
keys in a 2-3 tree of height h then
N (0) = 1; N (h) = 1 + 2N (h 1)

 If M (h) denotes the maximum number of


keys in a 2-3 tree of height h then
M (0) = 2; M (h) = 2 + 3M (h 1)

 The number of keys in a 2-3 tree of height


h is at most 3h+1 1 and at least 2h+1 1
 Height of a 2-3 tree with n nodes is (log n)

 Routine to search for a key in a 2-3 tree is


straightforward (running time proportional
to height)
46
Implementation notes
 The node-type TwoThreeNode has the elds
NumKeys, Key1, Key2, left, center, right

 The ADT TwoThreeTree supports insert,


remove, and nd
 We will discuss insertion.
Deletion is analogous.
 All operations are proportional to height,
hence (log n)

47
Insertion Phase 1
 Goal: to insert a key k in a 2-3 tree
 Phase 1: Starting at the root, proceed to
a leaf executing the following algorithm at
each node
 If the node u has 1 key then
{ if k < u:key1 then proceed to u.left
{ if k > u:key1 then proceed to u.center
 If the node u has 2 keys then
{ if k < u:key1 then proceed to u.left
{ if k > u:key2 then proceed to u.right
{ if k < u:key2 and k > u:key1 then
proceed to u.center

48
Sample Insertion
To insert 19 in the following tree, starting at
u0, descend to u2 and then to leaf u4
u0
18 33
p1 p2 p3
u1 u2 u3
12 23 30 48
p4 p6
p5
u4 u5 u6
10 15 20 21 24 31 45 47 50 52

Each ui is a node, and pi is pointer to ui

49
Insertion Phase 2: Leaf
 Suppose we are inserting key k in leaf u
 If u has only 1 key k1, then set u:key1 to
min(k; k1), u:key2 to max(k; k1), and u:NumKeys
to 2
 If u has two keys k1 and k2, then
{ update u:key1 to min(k; k1; k2)
update u:NumKeys to 1
(u will have only one key)
{ create a new leaf node v with
v:NumKeys = 1 and
v:key1 = max(k; k1; k2)
{ Promote pointer q to v and the key mid(k; k1; k2)
to the parent of u

50
Inserting 19 in leaf u4

p4
u4
Before
20 21

p4 q4
u4 v4
19 21 After

Key 20 and pointer q4 promoted to node u2

51
Insertion Phase 2: Internal node
 Suppose we are inserting key k and pointer
p in an internal node u
 If u has only 1 key k1.
{ for k > k1: u:key2 = k1, u:right = p
{ for k < k1: u:key1 = k, u:key2 = k1,
u:right = u:center, u:center = p
 If u has two keys k1 and k2, and pointers
pL, pC and pR. Then three cases depending
on the ordering of k, k1, and k2. Let us
consider the case k < k1:
{ update u:NumKeys to 1, u:key1 to k,
and u.center to p
{ crate new node v with elds NumKeys =
1, key1 = k2, left = pC , right = pR
{ promote key k1 and pointer to v
52
Inserting key 20 and pointer q4 in u2

p2
u2
Before
23 30
p4 p5 p6

p2 q2
u2 v2
20 30 After
p4 q4 p5 p6

Key 23 and pointer q2 promoted to node u0

53
Inserting key 23 and pointer q2 in u0

u0 p0
Before
18 33
p1 p2 p3

p0 q0
u0 v0
18 33 After
p1 p2 q2 p3

Key 23 and pointer q0 promoted

54
Insertion phase 2: new root
 Suppose the root r gets split and promotes
key k together with pointer p
 Crate a new node v with one key k and
left = r and right = p
 The new node v will be the root of the tree
ROOT
23
p0 q0
u0 v0
18 33
p1 p2 q2 p3

55
Result of inserting key 19
23

18 33

12 20 30

10 15 19 21 24 31

48

45 47 50 52

56
Alternative Variant
 All keys are in leaves. A leaf can have 2 or
3 keys
 key1 in an internal node u denotes the small-
est key in a leaf of u.center, key2 denotes
the smallest key in a leaf of u.right (may be
absent)

22

16 41, 58

8,11,12 16, 17 22,23,31 41,52 58,59,61

57
B-tree
 Can store multiple keys in a node with
multiple branches
 A B-tree of order m has
{ Root is either a leaf or has betn 2 to m
children
{ All internal nodes (except root) have betn
dm=2e and m children
{ All leaves are at the same depth
{ Data stored at leaves. Each leaf has betn
dm=2e and m keys
{ An internal node with children p0; p1 : : : pl
has keys k1; : : : kl where ki is the mini-
mum key in subtree pi
 A 2-3 tree is a B-tree of order 3

58
Sample B-tree of order 4

24

15,20 33,45,48

5, 10 15,16,18 20,23 24,28,30 33,34,35 45,46 48,50

59
More on B-trees
 Height of a B-tree of order m with n keys:
O(logmn)
 Insert requires similar promotion (splitting
root causes new root with two children)
 Cost of nd, insert, remove: height times
cost of processing a node
 Cost of processing a node O(m) (can be
made O(log m) for nd if keys at a node
are organized as an AVL tree
 In databases, nodes are stored on a disk.
Cost of accessing nodes is much higher than
cost of processing
 m chosen based on how many keys + point-
ers can be stored on a single block (e.g. 256)
60

Вам также может понравиться