Академический Документы
Профессиональный Документы
Культура Документы
http://www.cs.cornell.edu/courses/cs3110/2011sp/Lectures/lec20-amortized/amortized.htm\
https://anh.cs.luc.edu/363/notes/06A_Amortizing.html
https://courses.engr.illinois.edu/cs225/sp2019/notes/disjoint-sets/
Arrays
Prev
Next
Welcome
Arrays and Linked Lists
o
Video: LectureArrays
7 min
Video: LectureSingly-Linked Lists
9 min
Video: LectureDoubly-Linked Lists
4 min
10 min
Video: LectureStacks
10 min
Video: LectureQueues
7 min
10 min
Trees
o
Video: LectureTrees
11 min
Video: LectureTree Traversal
10 min
10 min
o
Practice Quiz: Basic Data Structures
5 questions
Programming Assignment 1
o
10 min
10 min
2h
Acknowledgements (Optional)
o
Reading: Acknowledgements
10 min
Arrays
So in this lecture we're talking about arrays and linked lists.
Play video starting at 4 seconds and follow transcript0:04
In this video, we're going to talk about arrays.
Play video starting at 7 seconds and follow transcript0:07
So here's some examples of declarations of arrays in a couple of different languages. Along with, we
can see the one dimensional array laid out with five elements in it, and then a two dimensional array
with one row, sorry two rows and five columns.
Play video starting at 26 seconds and follow transcript0:26
So what's the definition of an array? Well we got basically a contiguous array of memory. That is one
chunk of memory. That can either be on a stack or it can be in the heap, it doesn't really matter where
it is.
Play video starting at 38 seconds and follow transcript0:38
It is broken down into equal sized elements, and each of those elements are indexed by contiguous
integers. All three of these things are important for defining an array.
Play video starting at 50 seconds and follow transcript0:50
Here, in this particular example, we have an array whose indices are from 1 to 7. In many languages,
the same indices for this particular array would be from zero to six. So it would be zero based
indexing, but one based indexing is also possible in some languages. And other languages allow you to
actually specify what the initial index is.
Play video starting at 1 minute 12 seconds and follow transcript1:12
What's so special about arrays? Well, the key point about an array is we have random access. That is,
we have constant time access to any particular element in an array. Constant time access to read,
constant time access to write.
Play video starting at 1 minute 28 seconds and follow transcript1:28
How does that actually work? Well basically what that means is we can just do arithmetic to figure
out the address of a particular array element.
Play video starting at 1 minute 37 seconds and follow transcript1:37
So the first thing we need to do is start with the address of the array.
Play video starting at 1 minute 42 seconds and follow transcript1:42
So we take the address of the array and then we multiply that by first the element size. So this where
the key part that every element was the same size matters, so that allows us to do a simple
multiplication. Rather than if each of the array elements were of different sizes, we'd have to sum
them together, and if we had to sum together n items, that would be order n time. So we take our
array address, we add to it the element size times i which is the index that's of interest minus the
first_index.
Play video starting at 2 minutes 15 seconds and follow transcript2:15
If we're doing zero based indexing, that first index isn't really necessary. I like this example because it
really shows a more general case where we do have a first index.
Play video starting at 2 minutes 24 seconds and follow transcript2:24
Let's say for instance we're looking at the address for index four. We would take four minus the first
index, which is one, which would give us three. Multiply that by whatever our element size is, and
then add that to our array address. Now of course, we don't have to do this work, the compiler or
interpreter does this work for us, but we can see how it is that it works in constant-time.
Play video starting at 2 minutes 51 seconds and follow transcript2:51
Many languages also support multi-dimensional arrays, if not you can actually kind of roll your own
through an example I'll show you here, where you do your own arithmetic. So here, let's look. Let's
say that the top left element is at index (1, 1), and here's the index (3,4). So this means we're in row 3,
column 4. How do we find the address of that element? Well, first off what we need to do is skip the
rows that, the full rows, that we're not using. So that is, we need to skip two rows, or skip 3, which is
the row index minus 1, which is the initial row index. So that gives us 2 times 6 or 12 elements we're
skipping for those rows in order to get to row 3. Then we've got to skip the elements before (3,4) in
the same row. So there are three of them. How do we get that? We take the column index, which is 4
and subtract it from the initial column index which is 1. So this basically gives us 15.
Play video starting at 3 minutes 56 seconds and follow transcript3:56
Six for the first row, six for the second row and then three for the third row before this particular
element. We take that 15 and multiply it by our element size and then add it to our array address.
And that will give us the address of our element (3,4).
Play video starting at 4 minutes 14 seconds and follow transcript4:14
Now we made kind of a supposition here. And that was that the way this was laid out is we laid out all
the elements of the first row, followed by all of the elements of the second row, and so on. That's
called row-major ordering or row-major indexing. And what we do is basically, we lay out, (1, 1), (1,
2), (1, 3), (1, 4), (1, 5), (1, 6). And then right after that in memory (2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2,
6). So the column index is changing most rapidly as we're looking at successive elements. And that's
an indication of it's row-major indexing.
Play video starting at 4 minutes 48 seconds and follow transcript4:48
We could lay out arrays differently, and some languages or compilers actually do that, where they
would lay out each column in order, so you'd have the first column, then the second column, and
then the third column. And so that, then, the successive elements would be (1, 1), (2, 1), (3, 1),
followed by (1, 2), (2, 2), (3, 2), and so on.
Play video starting at 5 minutes 9 seconds and follow transcript5:09
So there we see that the row index is changing most rapidly, and this is called column-major ordering.
Play video starting at 5 minutes 19 seconds and follow transcript5:19
How long does it take to perform operations? We already said to read any element is O(1), and to
write any element is O(1). That is a standard feature of arrays. What happens if we want to add an
element at the end of an array? So let's say we have allocated seven elements for an array. We're
only using four of them, okay? So we have kept track that we're using four and we want to add a fifth
element. And again there's room for seven. Then all we know it was just add it, then update the
number of elements that are in use. That's an O(1) operation. If we want to remove the last element
as well, that's an O(1) operation because we just update the number of elements that are in use, and
so that's an O(1) operation.
Play video starting at 6 minutes 3 seconds and follow transcript6:03
Where it gets to be expensive, is if we want to, for instance, remove the first element. So we remove
the five here, and what we've got to do then, we don't want to have holes left in it. So we need to
move the 8 down, move the 3 down, move the 12 down. That's an O(n) operation.
Play video starting at 6 minutes 19 seconds and follow transcript6:19
Same thing would happen if he wanted to insert at the beginning. So we would need to move the 12,
move the 3, and move the 8 to make space for our new element. So that also would be O(n).
Play video starting at 6 minutes 30 seconds and follow transcript6:30
And if we want to add or remove somewhere in the middle, again that's an O(n) operation. If we want
to add exactly in the middle, we have to move n/2 items, which is O(n). Same thing for removal. So
arrays are great if you want or remove at the end. But it's expensive if you want to add or remove in
the middle or at the beginning.
Play video starting at 6 minutes 50 seconds and follow transcript6:50
However, remember, a huge advantage for arrays is that we have this constant time access to
elements, either read or write.
Play video starting at 7 minutes 1 second and follow transcript7:01
In summary then, an array consists of a contiguous area of memory. because if it were non-
contiguous then we couldn't just do this simple arithmetic to get where we're going. We have to have
equal-size elements again so our arithmetic works. And indexed by contiguous integers again so our
arithmetic works.
Play video starting at 7 minutes 18 seconds and follow transcript7:18
We have constant time access to any element, constant time to add or remove at the end and linear
time to add and remove at any arbitrary location.
Play video starting at 7 minutes 27 seconds and follow transcript7:27
In our next video we're going to talk about linked lists.
Now let's talk about linked lists.
Play video starting at 2 seconds and follow transcript0:02
So linked lists, it's named kind of like links in a chain, right, so we've got a head pointer that points to
a node that then has some data and points to another node, points to another node and eventually
points to one that doesn't point any farther. So here in our top diagram we show head points to the
node containing 7, points to the node containing 10, points to the node containing 4, points to the
node containing 13 doesn't point anywhere. How this actually works is that a node contains a key
which in this case is these integers, and a next pointer. The diagram below shows more detail of
what's going on. So head is a pointer that points to a node, and that node contains two elements, the
value 7. And then a pointer that points off to the next node that contains a key 10, and a pointer that
points off to the next node 4, points off to the next node 13, 13's next pointer is just nill.
Play video starting at 59 seconds and follow transcript0:59
What are the operations that can be done on a linked list? There's several of them, and the names of
these sometimes are different, in different environments and different libraries. But normally the
operations provided are roughly these. So we can add an element to the front of the list, and that
we're calling PushFront. So that takes a key, adds it to the front of the list. We can return the front
element of the list. We're calling that TopFront. Or we can remove the front element of the list, called
PopFront. The same things that we can do at the front of the list, we can also do at the end of the list.
With PushBack, later on in a later module, we'll actually use the word Append for that, or TopBack, or
PopBack.
Play video starting at 1 minute 45 seconds and follow transcript1:45
These seem uniform in there, but there is a difference in that the runtimes are going to be different
between those, and we're going to talk about that.
Play video starting at 1 minute 54 seconds and follow transcript1:54
You can find whether an element is in the list and it's as simple as just running yourself down the the
list looking to find a matching key.
Play video starting at 2 minutes 0 seconds and follow transcript2:00
You can erase an element and then again run yourself down the list til you find the matching key and
then remove that element. So these latter ones are both O(n) time.
Play video starting at 2 minutes 10 seconds and follow transcript2:10
Is the list empty or not? That's as simple as checking is the head equal to nil.
Play video starting at 2 minutes 15 seconds and follow transcript2:15
We can add a particular key--if we want to splice in a key into a list we can actually add in a key either
before a given node or after a given node.
Play video starting at 2 minutes 27 seconds and follow transcript2:27
So lets look at the times for some common operations.
Play video starting at 2 minutes 30 seconds and follow transcript2:30
We've got here our list with four elements in it: 7, 10, 4, and 13. Now we go ahead and push an
element to the front. So we push 26 to the front of the list. So the first thing we do, create a node
that contains the 26 as its key. And then we update our next pointer of that node to point to the
head, which is the 7 element, and then update the head pointer to point to our new node, and that's
it we're done. So it's O(1). Allocate, update one pointer, update another pointer, constant time.
Play video starting at 3 minutes 6 seconds and follow transcript3:06
If we want to pop the front element, clearly finding the front element is very cheap here, right? You
can just look at the first element and return it. So TopFront is O(1). PopFront turns out is going to be
O(1). First thing we're going to do, update the head pointer. Then, remove the node. That's an O(1)
operation.
Play video starting at 3 minutes 26 seconds and follow transcript3:26
If we want to push at the back, and we don't have a tail pointer, we're going to talk about a tail
pointer in a moment, then it's going to be a fairly expensive operation. We're going to have to start at
the head and walk our way down the list until we get to the end, and add a node there, so that's going
to be O(n) time.
Play video starting at 3 minutes 44 seconds and follow transcript3:44
Similarly if we want to TopBack or PopBack, we're going to also have to start at the head, walk our
way down to the last element. Those are all going to be O(n) time.
Play video starting at 3 minutes 56 seconds and follow transcript3:56
If we had a tail pointer, some of these will become simpler. Okay, so, we're going to have both a head
pointer that points to the head element and a tail pointer that points to the tail element. So, that
way, getting the first element is cheap. Getting the last element is cheap.
Play video starting at 4 minutes 9 seconds and follow transcript4:09
Let's look at what happens when we try an insert when we have a tail. We allocate a node, put in our
new key, and we then update the next pointer of the current tail, to point to this new tail. And then
update the tail pointer itself.
Play video starting at 4 minutes 29 seconds and follow transcript4:29
O(1) operation.
Play video starting at 4 minutes 31 seconds and follow transcript4:31
Retrieving the last element, so a PopBack, sorry a TopBack, is also an O(1) operation. We just go to
the tail, find the element, return the key.
Play video starting at 4 minutes 40 seconds and follow transcript4:40
If we want to pop the back however that's a little bit of an expensive operation. Okay. We are going to
need to update the tail to point from 8 to 13 so we're at 8 right now we want to go to 13, the problem
is how do we get to 13? Okay.
Play video starting at 4 minutes 57 seconds and follow transcript4:57
We don't have a pointer from 8 to 13 we have a pointer from 13 to 8. And that pointer doesn't help
us going back. So what we've got to do is, again, start at the head, walk our way down until we find
the 13 node that then points to the current tail, and then update our tail pointer to point to that, and
then update the next pointer to be nil. And then we can remove that old one. So that's going to be an
O(n) operation. because we've got to walk all the way down there. Okay, because even though we
have a tail pointer we don't have the next to the tail pointer, we don't have the next to last element.
Play video starting at 5 minutes 36 seconds and follow transcript5:36
The head is different because our pointers point this way, if we had the head Its also cheap to get the
second element, right, and one more to get the third element but the tail pointer doesn't help us get
to the next to the last element.
Play video starting at 5 minutes 49 seconds and follow transcript5:49
Let's look at some of the code for this, so for PushFront we have a singly linked list: we're going to
allocate a new node, set its key, set its next to point to the old head and then we'll update the current
head pointer.
Play video starting at 6 minutes 3 seconds and follow transcript6:03
If the tail is equal to nil, that meant that before the insertion, the head and the tail were nil, it was an
empty list. So we've got to update the tail to point to the same thing the head points to.
Play video starting at 6 minutes 13 seconds and follow transcript6:13
Popping the front, well, if we're asked to pop the front on an empty list, that's an error. So that's the
first check we do here and then we just update the head to point now to the head's next. And just in
case that there was only one element in the list and now there are no elements, we check if our new
head is nil and if so update our tail to also be nil. Pushing in the back: allocate a new node, set its key,
set its next pointer, and then check the current tail. If the current tail is nil again, it's an empty list.
Update the head and the tail to point to that new node. Otherwise update the old tail's next to point
to our new node, and then update the tail to point to that new node.
Play video starting at 7 minutes 0 seconds and follow transcript7:00
Popping the back.
Play video starting at 7 minutes 2 seconds and follow transcript7:02
More difficult, right. If it's an empty list and we're trying to pop, that's an error. If the head is equal to
tail, that means we have one element. So we need to just update the head and the tail to nil.
Otherwise we've got to start at the head, and start working our way down, trying to find the next to
the last element. When we exit the while loop, p will be the next to last element, and we then update
its next pointer to nil.
Play video starting at 7 minutes 25 seconds and follow transcript7:25
And set our tail equal to that element.
Play video starting at 7 minutes 30 seconds and follow transcript7:30
Adding after a node? Fairly simple in a singly linked list. Allocate a new node,
Play video starting at 7 minutes 36 seconds and follow transcript7:36
set its next pointer to whatever node we're adding after, to its next. So we sort of splice in, and then
we need to update the node pointer. The one we're adding after, so that it points now to our new
node. And just in case that node we're adding after was the tail we've got to now update the tail to
that new node.
Play video starting at 7 minutes 55 seconds and follow transcript7:55
Adding before, we have the same problem we had in terms of PopBack in that we don't have a link
back to the previous element. So we have no way of updating its next pointer other than going back
to the beginning of the head and moving our way down until we find it. So AddBefore would be an
O(n) operation.
Play video starting at 8 minutes 18 seconds and follow transcript8:18
So let's summarize what the cost of things are. PushFront, O(1).
Play video starting at 8 minutes 23 seconds and follow transcript8:23
TopFront, PopFront, all O(1). Pushing the back O(n) unless we have a tail pointer in which case its
O(1).
Play video starting at 8 minutes 32 seconds and follow transcript8:32
TopBack O(n), again unless we have a tail pointer in which it's O(1). Popping the back: O(n) operation,
with or without a tail.
Play video starting at 8 minutes 42 seconds and follow transcript8:42
Finding a key is O(n) we just walk our way through the list trying to find a particular element. Erasing,
also O(n). Checking whether it's empty or not is as simple as checking whether the head is nil. Adding
before: O(n) because finding the previous element takes O(n) because we're going to walk all the way
from the head to find it. AddAfter: constant time.
There is a way to make popping the back and adding before cheap.
Our problem was that although we had a way to get from
a previous element to the next element, we had no way to get back.
And what a doubly-linked list says is, well, let's go ahead and
add a way to get back.
So we'll have two pointers, forward and back pointers.
That's the bidirectional arrow we're showing here conceptually.
And the way we would actually implement this is,
with a node that adds an extra pointer.
So we have not only a next pointer, we have a previous pointer.
So this shows for example that the 10 element has a next
pointer that points to 4 but a previous pointer that points to 7.
So at any node we can either go forward or we can go backwards.
Play video starting at 53 seconds and follow transcript0:53
So that means if we're trying to pop the back, that's going to work pretty well.
What we're going to do is update the tail pointer to point to the previous element
because again we ca get there in an O(1) operation.
And then update its next pointer to be nil and then finally remove the node.
So that's O(1).
So if we have a doubly linked list it's slightly more complicated (our code) because
we've got to make sure to manage both prev pointers as well as next pointers.
So if we're pushing something in the back, we'll allocate a new node.
If the tail is nil, which means it's empty, then we just have a single node
whose prev and next pointers are both nil and then head and tail both point to it.
Otherwise, we need to update the tail's next pointer for
this new node, because we're pushing at the end and
then go update the prev pointer of this new node to point to the old tail and
then finally update the tail pointer itself.
Play video starting at 1 minute 55 seconds and follow transcript1:55
Popping the back, also pretty straightforward.
We're going to again check to see whether this is first an empty list,
in which case it's an error.
A list with only one element, in which case it's simple.
Otherwise we're going to go ahead and
update our tail to be the prev tail, and the next of that node to be nil.
Adding after, fairly simple again we just need to maintain the prev pointer but
adding before also now works in the sense that we can allocate our node,
our new node and
its prev pointer will be the prev pointer of the existing node we're adding before.
We splice it in that way and
then we'll update the next pointer of that previous node to point to our new node.
Play video starting at 2 minutes 42 seconds and follow transcript2:42
And finally, just in case we're adding before the head,
we need to update the head.
So in a singly-linked list, we saw the cost of things.
Working with the front of the list was cheap,
working with the back of the list with no tail, was all linear time.
If we added a tail, it was easy to push something at the end,
easy to retrieve something at the end, but hard to remove something at the end.
By switching to a doubly linked list,
removing from the end (a PopBack) becomes now an O(1) operation,
as does adding before which used to be a linear time operation.
Play video starting at 3 minutes 18 seconds and follow transcript3:18
One thing to point out as we contrast arrays versus linked lists.
So in arrays, we have random access,
in a sense that it's constant time to access any element.
That makes things like a binary search very simple,
where we start searching in the middle, and then tell (if we have a sorted array),
and then can decide which side of the array we're on.
And then, go to one side or the other.
For a linked list, that doesn't work.
Finding the middle element is an expensive operation.
because you've got to start either at the head or the tail and
work your way into the middle.
So that's an O(n) operation to get to any particular element.
Big difference in between that and an array.
Play video starting at 3 minutes 57 seconds and follow transcript3:57
However, linked lists are constant time to insert at or remove from the front,
unlike arrays.
What we saw in arrays, if you want to insert from the front, or
remove from the front, it's going to take you O(n) time because you're going to have to move
a bunch of elements.
Play video starting at 4 minutes 10 seconds and follow transcript4:10
If you have a tail and doubly-linked,
it is also constant time to work at the end of the list.
So you can get at or remove from there.
Play video starting at 4 minutes 19 seconds and follow transcript4:19
It's linear time to find an arbitrary element.
The list element are not contiguous as they are in an array.
Play video starting at 4 minutes 25 seconds and follow transcript4:25
You have separately allocated locations of memory and
then there are pointers between them.
And then, with a doubly-linked list it's also constant time to insert
between nodes or to remove a node.
Slides and External References
Slides
Download the slides on arrays and linked lists:
References
See the chapter 10.2 in [CLRS] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest,
Clifford Stein. Introduction to Algorithms (3rd Edition). MIT Press and McGraw-Hill. 2009.
Slides
Download the slides for stacks and queues here:
05_2_stacks_and_queues.pdfPDF File
References
See the chapter 10.1 in [CLRS] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest,
Clifford Stein. Introduction to Algorithms (3rd Edition). MIT Press and McGraw-Hill. 2009.
Tree
In this lecture, we're going to talk about trees.
Let's look at some example trees.
So here we have a sentence, "I ate the cake".
Now, we're going to look at a syntax tree for that,
which shows the structure of the sentence.
So it's similar to sentence diagramming that you may have done in grade school.
Play video starting at 19 seconds and follow transcript0:19
So we have at the top of the tree, the S for
sentence and then children: a noun phrase and a verb phrase.
The child of the noun phrase is the word I from the sentence.
And the child of the verb phrase is a verb and noun phrase, where the verb is ate,
and the noun phrase is a determiner and a noun, the and cake.
So along the bottom of the tree, we have the words from the sentence,
"I ate the cake", and the rest of the tree reflects the structure of that sentence.
We can look here at a syntax tree for an expression 2sin(3z-7),
we can break that up into the structure.
So at the top level, we have a multiplication,
that's really the last thing that's done, multiplying the 2 and the sine.
Play video starting at 1 minute 9 seconds and follow transcript1:09
Within the sine, what we're applying the sine to is 3z-7,
so we have the minus that's happening last with a 7 and then this 3z, 3 times z.
So this shows again the structure of the expression and
the order in which you might evaluate it.
So from the bottom, you would first do 3 times z, and then you would subtract 7
from that, you'd apply the sine to that, and then you multiply that by 2.
Play video starting at 1 minute 36 seconds and follow transcript1:36
Trees are also used to reflect hierarchy.
So this reflects hierarchy of geography where we have at the left hand side
the top level of the hierarchy, the world.
And then below that,
entities in the world, United States, all sorts of other things, United Kingdom.
And then below that, various subcomponents of the geography.
So we've got, for the case of the United States, states, and
then within those states, cities.
Play video starting at 2 minutes 4 seconds and follow transcript2:04
Another example of a hierarchy is the animal kingdom.
This is part of it where we've got animals, and then below that, different
types of animals, so invertebrates, reptiles, mammals, and so on.
And then within each of these, we have various subcategorizations.
So this shows this entire hierarchy.
We also use trees in computer science for code.
So in order to represent code, we will do that with an abstract syntax tree.
So our code here is a while loop.
While x is less than 0, x is x+2, f of x.
So we reflect that at the top, we have while, which is our while loop.
And the children of the while loop are the condition that needs to be met for
the while loop to continue and then the statement to execute.
So the condition is x less than 0, so comparison operation, the variable x and
the constant 0.
And then the statement to execute, well, it's actually multiple statements so
we have a block.
And in those blocks, we have two different statements, an assignment statement and
a procedure call.
The assignment statement, the left child is the variable we're assigning to,
which is x, and the right child is an expression, in this case, x+2.
The procedure call, the left child is the name of the procedure, and
subsequent children are the arguments to that procedure.
In our case, we just have one argument x.
Binary search tree is a very common type of a tree used in computer science.
The binary search tree is defined by the fact that it's binary, so
that means it has at most two children at each node.
And we have the property that at the root node,
the value of that root node is greater than or
equal to all of the nodes in the left child, and
it's less than the nodes in the right child.
So here less than or greater than, we're talking about alphabetically.
So Les is greater than Alex, Cathy, and Frank, but
is less than Nancy, Sam, Violet, Tony, and Wendy.
And then that same thing is true for every node in the tree has the same thing.
For instance, Violet is greater than or equal to Tony and
strictly less than Wendy.
Play video starting at 4 minutes 30 seconds and follow transcript4:30
The binary search tree allows you to search quickly.
For instance, if we wanted to search in this tree for Tony, we could start at Les.
Notice that we are greater than Les, so therefore, we're going to go right.
We're greater than Sam so we'll go right.
We're less than Violet so we'll go left and then we find Tony.
And we do that in just four comparisons.
It's a lot like a binary search in a sorted array.
Play video starting at 4 minutes 54 seconds and follow transcript4:54
So with all these examples of trees, what's the actual definition of a tree?
Well a tree is, this is a recursive definition.
A tree is either empty or it's a node that has a key and
it has a list of child trees.
Play video starting at 5 minutes 9 seconds and follow transcript5:09
So if we go back to our example here, Les is a node that
has the key Les and two child trees, the Cathy child tree and the Sam child tree.
The Cathy child tree is a node with a key Cathy and
two child trees, the Alex child tree and the Frank child tree.
Let's look at the Frank child tree.
It's a node with a key Frank and two, well, does it have any child trees?
No, it has no child trees.
Play video starting at 5 minutes 37 seconds and follow transcript5:37
So let's look at some other examples.
An empty tree, well, we don't really have a good representation for that,
it's just empty.
A tree with one node is the Fred tree, and it has no children.
A tree with two nodes is a Fred with a single child Sally,
that in itself has no children.
Play video starting at 5 minutes 54 seconds and follow transcript5:54
In computer science commonly, trees grow down, so parents are above their children.
So that's why we have Fred above Sally.
Play video starting at 6 minutes 4 seconds and follow transcript6:04
So let's look at some other terminology for trees.
So here, we have a tree, Fred is the root of the tree.
So it's the top node in the tree.
Play video starting at 6 minutes 14 seconds and follow transcript6:14
And here, the children of Fred are Kate, Sally, and Jim.
We are actually showing that with arrows, commonly, when you show trees,
you don't actually show the arrows.
We just assume that if a node is above another node,
that it's a parent of that node.
Play video starting at 6 minutes 31 seconds and follow transcript6:31
A child has a line down directly from a parent, so
Kate is a parent of Sam, and Sam is a child of Kate.
Play video starting at 6 minutes 41 seconds and follow transcript6:41
An ancestor is a parent or parent's parents and so on.
So Sam's ancestors are Kate and Fred.
Hugh's ancestors are also Kate and Fred.
Sally's ancestors are just Fred.
Play video starting at 6 minutes 53 seconds and follow transcript6:53
The descendant is an inverse of the ancestor, so it's the child or
child of child and so on.
So the descendants of Fred are all of the other nodes since it's the root, Sam,
Hugh, Kate, Sally and Jim.
The descendants of Kate would just be Sam and Hugh.
Play video starting at 7 minutes 9 seconds and follow transcript7:09
Sibling, two parents, sorry,
two nodes sharing the same parent, so Kate, Sally and Jim are all siblings.
Play video starting at 7 minutes 18 seconds and follow transcript7:18
Sam and Hugh are also siblings.
Play video starting at 7 minutes 21 seconds and follow transcript7:21
A leaf is a node that has no children.
So that's Sam, Hugh, Sally, and Jim.
An interior node are all nodes that aren't leaves.
So this is Kate and Fred.
Another way to describe it is all nodes that do have children.
A level: 1 plus the number of edges between the root and
a node, let's think about that.
Fred, how many edges are there between the root and the Fred node?
Well, since the Fred node is the root, there are no edges.
So its level would be 1.
Kate has one edge between Fred and Kate,
so its level would be 2, along with its siblings, Sally and Jim.
Play video starting at 7 minutes 59 seconds and follow transcript7:59
And Sam and Hugh are level 3.
Play video starting at 8 minutes 2 seconds and follow transcript8:02
The height: the maximum depth of the subtree node in the farthest leaf,
so here we want to look, for instance, if we want to look at the height of Fred,
we want to look at what is its farthest down descendant.
Play video starting at 8 minutes 18 seconds and follow transcript8:18
And so its farthest down descendant would either be Sam or Hugh.
Its height would be 3.
So the leaf heights are 1.
Play video starting at 8 minutes 29 seconds and follow transcript8:29
Kate has height 2.
Fred has height 3.
We also have the idea of a forest.
Extending this tree metaphor, so it's a collection of trees.
So we have here two trees with a root Kate and a root Sally, and those form a forest.
Play video starting at 8 minutes 48 seconds and follow transcript8:48
So a node has a key, children,
which is a list of children nodes, and then it may or may not have a parent.
Play video starting at 8 minutes 56 seconds and follow transcript8:56
The most common representation probably of trees, is really without the parent.
But it's possible to also have parent pointers, and that can be useful as a way
to traverse from anywhere in a tree to anywhere else by going up and then down,
following parent nodes and then child nodes.
On rare occasions,
you could have a tree that's represented just with parent pointers.
Okay, but that's unusual because a lot of times, kind of the way you get access
to a tree is via its root and you want to go down from there.
There are other less commonly used representations of trees as well,
we're not going to get into here.
Play video starting at 9 minutes 33 seconds and follow transcript9:33
Binary trees are very commonly used.
So a binary tree has, at most, two children.
Rather than having in this general list of children, for a binary tree,
we normally have an explicit left and right child, either of which can be nil.
Play video starting at 9 minutes 49 seconds and follow transcript9:49
As with the normal tree, the general form of a tree, you may or
may not have a parent pointer.
Play video starting at 9 minutes 56 seconds and follow transcript9:56
Let's look at a couple of procedures operating on trees.
Play video starting at 10 minutes 1 second and follow transcript10:01
Since trees are recursively defined, it's very common to write
routines that operate on trees that are themselves recursive.
So for instance,
if we want to calculate the height of a tree, that is the height of a root node,
we can go ahead and recursively do that, going through the tree.
So we can say, for instance, if we have a nil tree, then its height is a 0.
Play video starting at 10 minutes 27 seconds and follow transcript10:27
Otherwise, we're 1 plus the maximum of the left child tree and the right child tree.
So if we look at a leaf for example, that height would be 1 because the height
of the left child is nil, is 0, and the height of the nil right child is also 0.
So the max of that is 0, 1 plus 0.
We could also look at calculating the size of a tree that is the number of nodes.
Again, if we have a nil tree, we have zero nodes.
Otherwise, we have the number of nodes in the left child plus 1 for
ourselves plus the number of nodes in the right child.
So 1 plus the size of the left tree plus the size of the right tree.
Play video starting at 11 minutes 7 seconds and follow transcript11:07
In the next video, we're going to look at different ways to traverse a tree.
Tree Traversal
In this video, we're going to continue talking about trees. And in particular, look at walking a tree, or
visiting the elements of a tree, or traversing the elements of a tree. So often we want to go through
the nodes of a tree in a particular order. We talked earlier, when we were looking at the syntax tree
of an expression, how we could evaluate the expression by working our way up from the leaves. So
that would be one way of walking through a tree in a particular order so we could evaluate. Another
example might be printing the nodes of a tree. If we had a binary search tree, we might want to get
the elements of a tree in sorted order.
Play video starting at 38 seconds and follow transcript0:38
There are two main ways to traverse a tree. One, is depth-first. So there, we completely traverse one
sub-tree before we go on to a sibling sub-tree. Alternatively, in breadth-first search we traverse all the
nodes at one level before we go to the next level. So in that case, we would traverse all of our siblings
before we visited any of the children of any of the siblings. We'll see some code examples of these. In
depth-first search, so we're going to look here at an in-order traversal. And that's really defined best
for a binary tree. This is InOrderTraversal is what we might use to print all the nodes of a binary
search tree in alphabetical order.
Play video starting at 1 minute 23 seconds and follow transcript1:23
So, we're going to have a recursive implementation, where if we have a nil tree, we do nothing,
otherwise, we traverse the left sub-tree, and then do whatever we're going to do with the key, visit it,
in this case, we're going to print it. But often there's just some operation you want to carry out, and
then traverse the right sub-tree. So let's look at an example of this. We've got our binary search tree.
And we're going to look at how these nodes get printed out if we do an in-order traversal. So to begin
with, we go to the Les node. And from there, since it's not nil, we're going to do an in-order traversal
of its left child, which is Cathy. Similarly now we're going to do an in-order traversal of its left child,
which is Alex.
Play video starting at 2 minutes 9 seconds and follow transcript2:09
We do an in-order traversal of its left child which is nil, so it does nothing. So we come back to Alex,
and then print out Alex, and then traverse its right sub-tree which is nil and does nothing. We come
back to Alex. And then we're finished with Alex and we go back to Cathy. So, we have successfully
completed Cathy's left sub-tree. So we did an in-order traversal of that, so now we're going to print
Cathy, and then do an in-order traversal of its right sub-tree, which is Frank.
Play video starting at 2 minutes 40 seconds and follow transcript2:40
So we go to Frank, similarly now we're going to print out Frank.
Play video starting at 2 minutes 44 seconds and follow transcript2:44
We've finished with Frank and go back to Cathy, and now we've completed Cathy totally, so we go
back to Les. We completed Les' left sub-tree, so we're now going to print Les and then traverse Les'
right sub-tree. So that is Sam, traverse its left sub-tree which is Nancy. Print it out, go back to Sam,
we've completed Sam's left sub-tree, so we print Sam, and then go ahead and do Sam's right sub-tree
which is Violet, which will end up printing Tony, Violet, and then Wendy. We're completed with
Wendy. We go back to Violet. We completed her right sub-tree, so we go back to Sam, completed his
right sub-tree, go back to Les, completed his right sub-tree, and we're done. So we see we get the
elements out in sorted order. And again, we do the left child. And then the node and then the right
child. And by our definition of a binary search tree, that then gives them to us in order because we
know all the elements in the left child are in fact less than or equal to the node itself.
Play video starting at 3 minutes 53 seconds and follow transcript3:53
The next depth-first traversal is a pre-order traversal. Now the in-order traversal really is only defined
for a binary tree because we talk about doing the left child and then the node and then the right child.
Play video starting at 4 minutes 9 seconds and follow transcript4:09
And so it's not clear if you had let's say three children, where it is you'd actually put the node itself. So
you might do the first child and then print the node, and then second and third child. Or first child and
then second child and print the node, and then third child. It's kind of undefined then, so not well-
defined.
Play video starting at 4 minutes 28 seconds and follow transcript4:28
However, these next two, the pre-order and post-order traversal are well defined. Not just for binary
trees, but for general, arbitrary number of children trees.
Play video starting at 4 minutes 37 seconds and follow transcript4:37
So here the pre-order traversal says, we're going to go ahead first if it's nil we return. We print the key
first, that is, we visit the node itself and then its children. So we're going to, in this case, go ahead and
go to
Play video starting at 4 minutes 53 seconds and follow transcript4:53
the Les tree and then print out its key and then go to its children. So we're going to first go to its left
child which is Cathy, and for Cathy, we then print Cathy, and then go to its left child which is Alex,
print Alex, we go back to Cathy.
Play video starting at 5 minutes 11 seconds and follow transcript5:11
And we finished its left child, so then we go do its right child, which is Frank. We finished Frank. We
finished Cathy. We go back up to Les. We've already printed Les. We've already visited or traversed
Les' left child. Now we can traverse Les' right child, so it'll be Sam, which we'll print out. And then
we'll go to Nancy, which we'll print out, we'll go back up to Sam and then to Violet, and we will print
Violet, and then print Violet's children, which will be Tony and Wendy and then return back.
Play video starting at 5 minutes 48 seconds and follow transcript5:48
A post-order traversal is like a pre-order traversal expect instead of printing the node itself first, which
is a pre, we print it last, which is the post. So all we've really done is move where this print statement
is.
Play video starting at 6 minutes 2 seconds and follow transcript6:02
And here then, what's the last of these notes that's going to be printed? Well it's actually going to be
Les, because we're not going to be able to print Les until we've finished
Play video starting at 6 minutes 14 seconds and follow transcript6:14
completely dealing with Les' left sub-tree and right sub-tree. So we'll visit Les, and then visit Cathy,
and then Alex, and then we'll actually print out Alex. Once we're done with Alex, we'll go back up to
Cathy and down to Frank, and then print out Frank, and then once we're done with both Alex and
Frank we can then print Cathy.
Play video starting at 6 minutes 32 seconds and follow transcript6:32
We go back up to Les, and we now need to go deal with Les' right child which is Sam. In order to deal
with Sam we go to Nancy, print Nancy, go back up to Sam and down to Violet, and deal with the
Violet tree, which will print out Tony, and then Wendy, and then Violet. And on our way back up,
then, when we get up to Sam, we have finished its children, so we can print out Sam. When we get up
to Les, we've finished its children, so we can print out Les. One thing to note about the recursive
traversal is we do have sort of under the covers, a stack that's being used. Because in a recursive call,
every time we make a call back to a procedure, we are invoking another frame on the stack. So we are
saving implicitly our information of where we are on the stack.
Play video starting at 7 minutes 28 seconds and follow transcript7:28
Breadth-first, we're going to actually use a queue instead of a stack. So in the breadth-first, we are
going to call it level traversal here, we're going to go ahead and instantiate a queue, and on the queue
first put the root of the tree. So we put that in the queue and then while the queue is not empty,
we're going to dequeue, so pull a node off, deal with that by printing it and then if it's got a left child,
enqueue the left child, if it's got a right child, enqueue the right child. And so this will have the effect
of going through and processing the elements in level order. We see the example here, and we're
going to show the queue. So here let's say we're just before the while loop, the queue contains Les.
And we're going to now dequeue Les from the queue, output it by printing it, and then enqueue Les'
children which are Cathy and Sam.
Play video starting at 8 minutes 19 seconds and follow transcript8:19
Now, we visit those in order, so first we're going to dequeue Cathy, print it out and then enqueue its
children. Remember when we're enqueuing we go at the end of the line, so Alex and Frank go after
Sam. So now we're going to dequeue Sam, print it, and then enqueue its children Nancy and Violet. So
we can see what we've done then is, we first printed Les, that's level one and then we printed the
elements of level two, which are Cathy and Sam, and now we're going to go on to the elements at
level three. So notice, all the elements in level three, Alex, Frank, Nancy, and Violet are in the queue
already.
Play video starting at 8 minutes 56 seconds and follow transcript8:56
And they're all going to be processed before any of the level four nodes
Play video starting at 9 minutes 2 seconds and follow transcript9:02
are processed. So even though they'll be pushed in the queue, since the level three nodes got there
first that they're all going to be processed before we process the level four ones. So here, we dequeue
Alex, print it out, and we're done. Dequeue Frank, print it out, we're done with Frank. Dequeue
Nancy, print it out, we're done with Nancy. And Violet, we print it out, but then also enqueue Tony
and Wendy, and then dequeue those and print them out. So this is a breadth-first search, with an
explicit queue, you can do depth-first searches rather than recursively, iteratively, but you will need
an additional data structure which is a stack to keep track of the work still to be done.
Play video starting at 9 minutes 45 seconds and follow transcript9:45
So in summary, trees are used for lots of different things in computer science.
Play video starting at 9 minutes 49 seconds and follow transcript9:49
We've seen that trees have a key and normally have children, although there are alternative
representations of trees.
Play video starting at 9 minutes 56 seconds and follow transcript9:56
The tree walks that are normally done are traversals, are DFS: depth-first search, and BFS: breadth-
first search. There are different types of depth-first search traversals, pre-order, in-order, and post-
order.
Play video starting at 10 minutes 9 seconds and follow transcript10:09
When you work with a tree, it's common to use recursive algorithms, although note that we didn't for
the breadth-first search where we needed to go through the elements of the tree in kind of a non-
recursive order. And finally, in computer science, trees grow down.
Slides
Download the slides for trees here:
05_3_trees.pdfPDF File
References
See the chapter 10.4 in [CLRS] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest,
Clifford Stein. Introduction to Algorithms (3rd Edition). MIT Press and McGraw-Hill. 2009.
To solve programming assignments, you can use any of the following programming languages:
C
C++
C#
Haskell
Java
JavaScript
Python2
Python3
Ruby
Scala
However, we will only be providing starter solution files for C++, Java, and Python3. Your
submission's programming language is detected automatically based on its file extension.
We have reference solutions in C++, Java and Python3, which solve the problem correctly under
the given restrictions, and in most cases spend at most 1/3 of the time limit and at most 1/2 of the
memory limit. You can also use other languages, and we've estimated the time limit multipliers for
them; however, we do not guarantee that there exists a correct solution in those languages for
every problem running under the given time and memory constraints.
Your solution will be compiled as follows. We recommend that when testing your solution locally,
you use the same compiler flags for compiling. This will increase the chances that your program
behaves in the same way on your machine and on the testing machine (note that a buggy
program may behave differently when compiled by different compilers, or even by the same
compiler with different flags).
mcs
Haskell (GHC 7.8.4). File extensions: .hs. Flags:
ghc -O
Java (Open JDK 8). File extensions: .java. Flags:
java -Xmx1024m
JavaScript (Node v6.3.0). File extensions: .js. No flags:
1
nodejs
Python 2 (CPython 2.7). File extensions: .py2 or .py (a file ending in .py needs to have a first line
which is a comment containing 'python2'). No flags:
python2
Python 3 (CPython 3.4). File extensions: .py3 or .py (a file ending in .py needs to have a first line
which is a comment containing 'python3'). No flags:
python3
Ruby (Ruby 2.1.5). File extensions: .rb. No flags:
1
ruby
Scala (Scala 2.11.6). File extensions: .scala. No flags:
You need to create submission and upload the file with your solution in one of the programming
languages C, C++, C#, Haskell, Java, JavaScript, Python2, Python3, Ruby, and Scala. Make sure
that after uploading the file with your solution you press on the blue "Submit" button in the bottom.
After that, the grading starts, and the submission being graded is enclosed in an orange rectangle.
After the testing is finished, the rectangle disappears, and the results of the testing of all problems
is shown to you.
I submit the solution for only one problem, but all the problems in the
assignment are graded
Each time you submit any solution, the last uploaded solution for each problem is tested. Don't
worry: this doesn't affect your score even if the submissions for the other problems are wrong. As
soon as you pass the sufficient number of problems in the assignment (see in the pdf with
instructions), you pass the assignment. After that, you can improve your result if you successfully
pass more problems from the assignment. We recommend working on one problem at a time,
checking whether your solution for any given problem passes in the system as soon as you are
confident in it. However, it is better to test it first, please refer to this reading from the "Algorithmic
Toolbox" course.
What are the possible grading outcomes, and how to read them?
Your solution can either pass or not. To pass, it must work without crashing and return the correct
answers on all the test cases we prepared for you, and do so under the time limit and memory
limit constraints specified in the problem statement. If your solution passes, you get the
corresponding feedback "Good job!" and get a point for the problem. If your solution fails, it can be
because it crashes, returns wrong answer, works for too long or uses too much memory for some
test case. The feedback will contain the number of the test case on which your solution fails and
the total number of test cases in the system. The tests for the problem are numbered from 1 to the
total number of test cases for the problem, and the program is always tested on all the tests in the
order from the test number 1 to the test with the biggest number.
1. Good job! - Hurrah! Your solution passed, and you get a point!
2. Wrong answer. - Your solution has output incorrect answer for some test case. If it is a
sample test case from the problem statement, or if you are solving Programming Assignment
1, you will also see the input data, the output of your program and the correct answer.
Otherwise, you won't know the input, the output and the correct answer. Check that you
consider all the cases correctly, avoid integer overflow, output the required whitespace,
output the floating point numbers with the required precision, don't output anything in addition
to what you are asked to output in the output specification of the problem statement. See
this reading on testing from the "Algorithmic Toolbox" course.
3. Time limit exceeded. - Your solution worked longer than the allowed time limit for some
test case. If it is a sample test case from the problem statement, or if you are solving
Programming Assignment 1, you will also see the input data and the correct answer.
Otherwise, you won't know the input and the correct answer. Check again that your algorithm
has good enough running time estimate. Test your program locally on the test of maximum
size allowed by the problem statement and see how long it works. Check that your program
doesn't wait for some input from the user which makes it to wait forever. See this reading on
testing from the "Algorithmic Toolbox" course.
4. Memory limit exceeded. - Your solution used more than the allowed memory limit for
some test case. If it is a sample test case from the problem statement, or if you are solving
Programming Assignment 1, you will also see the input data and the correct answer.
Otherwise, you won't know the input and the correct answer. Estimate the amount of memory
that your program is going to use in the worst case and check that it is less than the memory
limit. Check that you don't create too large arrays or data structures. Check that you don't
create large arrays or lists or vectors consisting of empty arrays or empty strings, since those
in some cases still eat up memory. Test your program locally on the test of maximum size
allowed by the problem statement and look at its memory consumption in the system.
5. Cannot check answer. Perhaps output format is wrong. - This happens when you
output something completely different than expected. For example, you are required to output
word "Yes" or "No", but you output number 1 or 0, or vice versa. Or your program has empty
output. Or your program outputs not only the correct answer, but also some additional
information (this is not allowed, so please follow exactly the output format specified in the
problem statement). Maybe your program doesn't output anything, because it crashes.
6. Unknown signal 6 (or 7, or 8, or 11, or some other). - This happens when your program
crashes. It can be because of division by zero, accessing memory outside of the array
bounds, using uninitialized variables, too deep recursion that triggers stack overflow, sorting
with contradictory comparator, removing elements from an empty data structure, trying to
allocate too much memory, and many other reasons. Look at your code and think about all
those possibilities. Make sure that you use the same compilers and the same compiler
options as we do. Try different testing techniques from this reading from the "Algorithmic
Toolbox" course.
7. Grading failed. - Something very wrong happened with the system. Contact Coursera for
help or write in the forums to let us know.
If your program works incorrectly, it gets a feedback from the grader. For the Programming
Assignment 1, when your solution fails, you will see the input data, the correct answer and the
output of your program in case it didn't crash, finished under the time limit and memory limit
constraints. If the program crashed, worked too long or used too much memory, the system stops
it, so you won't see the output of your program or will see just part of the whole output. We show
you all this information so that you get used to the algorithmic problems in general and get some
experience debugging your programs while knowing exactly on which tests they fail.
However, in the following Programming Assignments throughout the Specialization you will only
get so much information for the test cases from the problem statement. For the next tests you will
only get the result: passed, time limit exceeded, memory limit exceeded, wrong answer, wrong
output format or some form of crash. We hide the test cases, because it is crucial for you to learn
to test and fix your program even without knowing exactly the test on which it fails. In the real life,
often there will be no or only partial information about the failure of your program or service. You
will need to find the failing test case yourself. Stress testing is one powerful technique that allows
you to do that. You should apply it after using the other testing techniques covered in
this reading from the "Algorithmic Toolbox" course.
Often beginner programmers think by default that their programs work. Experienced programmers
know, however, that their programs almost never work initially. Everyone who wants to become a
better programmer needs to go through this realization.
When you are sure that your program works by default, you just throw a few random test cases
against it, and if the answers look reasonable, you consider your work done. However, mostly this
is not enough. To make one's programs work, one must test them really well. Sometimes, the
programs still don't work although you tried really hard to test them, and you need to be both
skilled and creative to fix your bugs. Solutions to algorithmic problems are one of the hardest to
implement correctly. That's why in this Specialization you will gain this important experience which
will be invaluable in the future when you write programs which you really need to get right.
It is crucial for you to learn to test and fix your programs yourself. In the real life, often there will be
no or only partial information about the failure of your program or service. Still, you will have to
reproduce the failure to fix it (or just guess what it is, but that's rare, and you will still need to
reproduce the failure to make sure you have really fixed it). When you solve algorithmic problems,
it is very frequent to make subtle mistakes. That's why you should apply the testing techniques
described in this reading from the "Algorithmic Toolbox" course to find the failing test case and fix
your program.
My solution does not pass the tests? May I post it in the forum and ask for a
help?
No, please do not post any solutions in the forum or anywhere on the web, even if a solution does
not pass the tests (as in this case you are still revealing parts of a correct solution). Recall the
third item of the Coursera Honor Code: "I will not make solutions to homework, quizzes, exams,
projects, and other assignments available to anyone else (except to the extent an assignment
explicitly permits sharing solutions). This includes both solutions written by me, as well as any
solutions provided by the course staff or others''.
Currently, we are going to support C++, Java, and Python only, but we may add other
programming languages later if there appears a huge need. To express your interest in a
particular programming language, please post its name in this thread (in the forum of the
"Algorithmic Toolbox" course) or upvote the corresponding option if it is already there.
My implementation always fails in the grader, though I already tested and stress
tested it a lot. Wouldn’t it be better if you give me a solution to this problem or
at least the test cases that you use? I will then be able to fix my code and will
learn how to avoid making mistakes. Otherwise, I don’t feel that I learn anything
from solving this problem. I’m just stuck.
First of all, it is just not true that you do not learn by trying to fix your implementation.
The process of trying to invent new test cases that might fail your program and proving them
wrong is often enlightening. This thinking about the invariants which you expect your loops, ifs,
etc. to keep and proving them wrong (or right) makes you understand what happens inside your
program and in the general algorithm you're studying much more.
Also, it is important to be able to find a bug in your implementation without knowing a test case
and without having a reference solution. Assume that you designed an application and an
annoyed user reports that it crashed. Most probably, the user will not tell you the exact sequence
of operations that led to a crash. Moreover, there will be no reference application. Hence, once
again, it is important to be able to locate a bug in your implementation yourself, without a magic
oracle giving you either a test case that your program fails or a reference solution. We encourage
you to use programming assignments in this class as a way of practicing this important skill.
If you’ve already tested a lot (considered all corner cases that you can imagine, constructed a set
of manual test cases, applied stress testing), but your program still fails and you are stuck, try to
ask for help on the forum. We encourage you to do this by first explaining what kind of corner
cases you have already considered (it may happen that when writing such a post you will realize
that you missed some corner cases!) and only then asking other learners to give you more ideas
for tests cases.
In this module, we discuss Dynamic Arrays: a way of using arrays when it is unknown ahead-of-
time how many elements will be needed. Here, we also discuss amortized analysis: a method of
determining the amortized cost of an operation over a sequence of operations. Amortized analysis
is very often used to analyse performance of algorithms when the straightforward analysis
produces unsatisfactory results, but amortized analysis helps to show that the algorithm is actually
efficient. It is used both for Dynamic Arrays analysis and will also be used in the end of this course
to analyze Splay trees.
Less
Key Concepts
Describe how dynamic arrays work
Calculate amortized running time of operations
List the methods for amortized analysis
More
Video: LectureDynamic Arrays
8 min
Resume
. Click to resume
5 min
6 min
7 min
2 min
4 questions
10 min
Dynamic Arrays
So in this lecture, we're going to talk about dynamic arrays and amortized analysis.
Play video starting at 8 seconds and follow transcript0:08
In this video we're going to talk about dynamic arrays.
Play video starting at 12 seconds and follow transcript0:12
So the problem with static arrays is, well, they're static.
Play video starting at 17 seconds and follow transcript0:17
Once you declare them, they don't change size, and you have to determine that size at compile time.
Play video starting at 28 seconds and follow transcript0:28
So one solution is what are called dynamically-allocated arrays. There you can actually allocate the
array, determining the size of that array at runtime. So that gets allocated from dynamic memory. So
that's an advantage. The problem is, what if you don't know the maximum size at the time you're
allocating the array?
Play video starting at 51 seconds and follow transcript0:51
A simple example, you're reading a bunch of numbers. You need to put them in an array. But you
don't know how many numbers there'll be. You just know there'll be some mark at the end that says
we're done with the numbers.
Play video starting at 1 minute 1 second and follow transcript1:01
So, how big do you make it? Do you make it 1,000 big? But then what if there are 2,000 elements?
Make it 10,000 big? But what if there are 20,000 elements? So, a solution to this. There's a saying that
says all problems in computer science can be solved by another level of indirection. And that's the
idea here. We use a level of indirection. Rather than directly storing a reference to the either static or
dynamically allocated array, we're going to store a pointer to our dynamically allocated array. And
that allows us then to update that pointer. So if we start adding more and more elements, when we
add too many, we can go ahead and allocate a new array, copy over the old elements, get rid of the
old array, and then update our pointer to that new array. So these are called dynamic arrays or
sometimes they're called resizable arrays. And this is distinct from dynamically allocated arrays.
Where we allocate an array, but once it's allocated it doesn't change size.
Play video starting at 2 minutes 2 seconds and follow transcript2:02
So a dynamic array is an abstract data type, and basically you want it to look kind of like an array. So it
has the following operations, at a minimum. It has a Get operation, that takes an index and returns
you the element at that index, and a Set operation, that sets an element at a particular index to a
particular value.
Play video starting at 2 minutes 21 seconds and follow transcript2:21
Both of those operations have to be constant time. Because that kind of what it means to be an array,
is that we have random access with constant time to the elements. We can PushBack so that adds a
new element to the array at the end of the array.
Play video starting at 2 minutes 37 seconds and follow transcript2:37
We can remove an element at a particular index. And that'll shuffle down all the succeeding ones. And
finally, we can find out how many elements are in the array. How do we implement this? Well, we're
going to store arr, which is our dynamically-allocated array. We're going to store capacity, which is the
size of that dynamically-allocated array, how large it is. And then size is the number of elements that
we're currently using in the array. Let's look at an example. So let's say our dynamically allocated
array has a capacity of 2. But we're not using any elements in it yet, so it's of size 0. And arr then
points to that dynamically allocated array.
Play video starting at 3 minutes 18 seconds and follow transcript3:18
If we do a PushBack of a, that's going to go ahead and put a into the array and update the size.
Play video starting at 3 minutes 26 seconds and follow transcript3:26
We now push b, it's going to put b into the array and update the size.
Play video starting at 3 minutes 31 seconds and follow transcript3:31
Notice now the size is equal to the capacity which means this dynamically allocated array is full. So if
we get asked to do another PushBack, we've got to go allocate a new dynamically-allocated array.
We're going to make that larger, in this case it's of size 4. And then we copy over each of the elements
from the old array to the new array.
Play video starting at 3 minutes 53 seconds and follow transcript3:53
Once we've copied them over, we can go ahead and update our array pointer to point to this new
dynamically allocated array, and then dispose of the old array.
Play video starting at 4 minutes 2 seconds and follow transcript4:02
At this point now we finally have our new dynamically allocated array, that has room to push another
element, so we push in c.
Play video starting at 4 minutes 10 seconds and follow transcript4:10
We push in d, if there is room we put it in, update the size. And now if we try and push another
element, again we have a problem, we're too big. We can allocate a new array. In this case, we're
going to make it of size 8. We'll talk about how you determine that size somewhat later.
Play video starting at 4 minutes 25 seconds and follow transcript4:25
And then copy over a, b, c, and d, update the array pointer, de-allocate the old array, and now we
have room we can push in e. So that's how dynamic arrays work. Let's look at some of the
implementations of the particular API methods.
Play video starting at 4 minutes 44 seconds and follow transcript4:44
Get is fairly simple. So we just check and see, we're going to assume for the sake of argument,
Play video starting at 4 minutes 51 seconds and follow transcript4:51
that we are doing 0-based indexing here. So if we want to Get(i), we first check and make sure, is i in a
range? That is, is it non-negative, and is it within the range from 0 to size i minus 1? Because if it's less
than 0 or it's greater or equal to size, it's out of range, that will be an error.
Play video starting at 5 minutes 12 seconds and follow transcript5:12
If we're in range then we just return index i from the dynamically allocated array.
Play video starting at 5 minutes 18 seconds and follow transcript5:18
Set is very similar. Check to make sure out index is in bounds, and then if it is, update index i of the
array to val. PushBack is a little more complicated. So, let's actually skip the if statement for now and
just say, let's say that there is empty space in our dynamic array. In that case, we just set array at size
to val and then increment size.
Play video starting at 5 minutes 43 seconds and follow transcript5:43
If, however, we're full, we're not going to do that yet, if size is equal to capacity, then we go ahead
and allocate a new array. We're going to make it twice the capacity, and then we go through a for
loop, copying over every one of the elements from the existing array to the new array.
Play video starting at 6 minutes 0 seconds and follow transcript6:00
We free up the old array and then set array to the new one.
Play video starting at 6 minutes 3 seconds and follow transcript6:03
At that point then, we've got space and we go ahead and set the size element and then increment
size.
Play video starting at 6 minutes 10 seconds and follow transcript6:10
Remove's fairly simple. Check that our index is in bounds and then go ahead through a loop, basically
copying over successive elements and then decrementing the size.
Play video starting at 6 minutes 20 seconds and follow transcript6:20
Size is simple, will just return size.
Play video starting at 6 minutes 25 seconds and follow transcript6:25
There are common implementations for these dynamic arrays and C++'s vector class is an example of
a dynamic array. And there, notice it uses C++ operator overloading, so you can use the standard
array syntax of left brackets, to either read from or write to an element. Java has an ArrayList. Python
has the list. And there is no static arrays in Python. All of them are dynamic. What's the runtime? We
saw Get and Set are O(1), as they should be. PushBack is O(n). Although we're going to see that's only
the worst case. And most of the time actually, when you call PushBack, it's not having to do the
expensive operation, that is, the size is not equal to capacity. For now, though, we're just going to say
that it's O(n). We'll look at a more detailed analysis when we get into aggregate analysis in our next
video.
Play video starting at 7 minutes 24 seconds and follow transcript7:24
Removing is O(n), because we've gotta move all those elements. Size is O(1).
Play video starting at 7 minutes 30 seconds and follow transcript7:30
So in summary, unlike static arrays, dynamic arrays are dynamic. That is, they can be resized.
Play video starting at 7 minutes 37 seconds and follow transcript7:37
Appending a new element to a dynamic array is often constant time, but it can take O(n). We're going
to look at a more nuanced analysis in the next video. And some space is wasted. In our case, if we're
resizing by a factor of two, at most half the space is wasted. If we were making our new array three
times as big, then we can waste two-thirds of our space. If we're only making it 1.5 as big, then we
would waste less space. It's worth noting dynamic array can also be resized smaller, that's possible
too. It's worth thinking about what if we resized our array to a smaller dynamic array as soon as we
got under one-half utilization? And it turns out we can come up with a sequence of operations that
Play video starting at 8 minutes 22 seconds and follow transcript8:22
gets to be quite expensive.
Play video starting at 8 minutes 25 seconds and follow transcript8:25
In the next video, we're going to talk about amortized analysis. And in particular, we're going to look
at one method called the aggregate method.
So we'll discuss now what Amortized Analysis is and look at a particular method for doing such
analysis.
Play video starting at 7 seconds and follow transcript0:07
Sometimes, we're looking at an individual worst case and that may be too severe. In particular we
may want to know the total worst case for a sequence of operations and it may be some of those
operations are cheap, while only certain of them are expensive. So if we look at the worst case
operation for any one and multiply that by the total, it may be overstating the total cost.
Play video starting at 32 seconds and follow transcript0:32
As an example, for a dynamic array, we only resize every so often. Most of the time, we're doing a
constant time operation, just adding an element. It's only when we fully reach the capacity, that we
have to resize. So the question is, what's the total cost if you have to insert a bunch of items?
Play video starting at 51 seconds and follow transcript0:51
So here's the definition of amortized cost. You have a sequence of n operations, the amortized cost is
the cost of those n operations divided by n.
Play video starting at 1 minute 2 seconds and follow transcript1:02
This is similar in spirit to let's say you buy a car for, I don't know, $6,000. And you figure it's going to
last you five years.
Play video starting at 1 minute 12 seconds and follow transcript1:12
Now, you have two possibilities. One, you pay the $6,000 and then five years later you have to pony
up another $6,000. Another option would be to put aside money every month. So five years is 60
months. So if you put away $100 a month, once the five years is over, then when it's time to buy a
new car for $6000, you'll have $6000 in your bank account. And so there that amortized cost (monthly
cost) is $100 a month, whereas the worst case monthly cost is actually 6,000, it's 0 for 59 months and
then it's 6,000 after one month, so you can see that, that amortized cost gives you a more balanced
understanding. If you really want to know what's the most I spend in every month, the answer yes is
$6,000. But if you want to know sort of an average what am I spending, $100 is a more reasonable
number. So that's why we do this amortized analysis, to get a more nuanced picture of what it looks
like for a succession of operations.
Play video starting at 2 minutes 22 seconds and follow transcript2:22
So let's look at the aggregate method of doing amortized analysis. And the aggregate method really
says, let's look at the definition of what an amortized cost is, and use that to directly calculate.
Play video starting at 2 minutes 37 seconds and follow transcript2:37
So we're going to look at an example of dynamic array and we're going to do n calls to PushBack. So
we're going to start with an empty array and n times call PushBack.
Play video starting at 2 minutes 48 seconds and follow transcript2:48
And then we'll find out what the amortized cost is of a single call to PushBack. We know the worst
case time is O(n). Let's define c sub i as the cost of the i'th insertion. So we're interested in c1 to cn.
Play video starting at 3 minutes 4 seconds and follow transcript3:04
So ci is clearly 1. because we have got to actual, and what we're going to count for a second here is
writing into the array. So the cost is 1 because we have to write in this i'th element that we're adding.
Play video starting at 3 minutes 20 seconds and follow transcript3:20
Regardless of whether or not we need to resize.
Play video starting at 3 minutes 24 seconds and follow transcript3:24
If we need to resize, the first question is when do we need to resize? We need to resize if our capacity
is used up. That is if the size is equal to capacity. Well when does that happen? That happens if the
previous insertion filled it up. That is made it a full power of 2, because in our case we're always
doubling the size. So that says on the i'th insertion we're going to have to resize if the i'th- 1 filled it
up. That is the i- 1 is a power of 2.
Play video starting at 3 minutes 56 seconds and follow transcript3:56
And if we don't have to resize, there's no additional cost, it's just zero.
Play video starting at 4 minutes 1 second and follow transcript4:01
So the total amortized cost is really the sum of the n actual costs divided by n. So that's a summation
from i = 1 to n of c sub i. And again c sub i is the cost of that i'th insertion. While that's equal to n,
because every c sub i has a cost of 1, so we sum that n times, that's n plus then the summation from
what's this, this looks a little complicated so j = 1 to the floor of log base 2 of n- 1 of 2 to the j. That
just really says the power of twos. All the way up to n- 1. So to give an example, if n is 100, the power
of 2s are going to be 1, 2, 4, 8, 16, 32, and 64. And it's the summation of all of those. Well that
summation is just order n.
Play video starting at 5 minutes 0 seconds and follow transcript5:00
Right. We basically take powers of 2 up to but not including n. And that is going to be no more than
2n. So we've got n plus something no more than 2n, that's clearly O(n) divided by n, and that's just
O(1). So what we've determined then is that we have a amortized cost for each insertion of order 1.
Play video starting at 5 minutes 26 seconds and follow transcript5:26
Our worst case cost is still order n, so if we want to know how long it's going to take in the worst case
for any particular insertion is O(n), but the amortized cost is O(1).
Play video starting at 5 minutes 38 seconds and follow transcript5:38
In the next video, we're going to look at an alternative way to do this amortized analysis.
In this video, we're going to talk about a second way to do Amortized Analysis, what we call the
Banker's Method.
Play video starting at 8 seconds and follow transcript0:08
The idea here is that we're going to charge extra for each cheap operation. So it's sort of like we're
taking the example where we looked at saving money for a car. We're going to actually take that $100
and put it in the bank. And then we save those charges somewhere, in the case of the bank we put it
in the bank. In our case we're going to conceptually save it in our data structure. We're not actually
changing our code, this is strictly an analysis. But we're conceptually thinking about putting our saved
Play video starting at 41 seconds and follow transcript0:41
extra cost as sort of tokens in our data structure that later on we'll be able to use to pay for the
expensive operations. To make more sense as we see an example.
Play video starting at 52 seconds and follow transcript0:52
So it's kind of like an amortizing loan or this case I talked about where we're saving $100 a month
towards a $6000 car, because we know our current car is going to run out.
Play video starting at 1 minute 4 seconds and follow transcript1:04
Let's look at this same example where we have a dynamic array and n calls to PushBack starting with
an empty array. The idea is we're going to charge 3 for every insertion. So every PushBack, we're
going to charge 3. One is the raw cost for actually
Play video starting at 1 minute 22 seconds and follow transcript1:22
moving in this new item into the array, and the other two are going to be saved.
Play video starting at 1 minute 28 seconds and follow transcript1:28
So if we need to do a resize in order to pay for moving the elements, we're going to use tokens we've
already saved in order to pay for the moving. And then, we're going to place 1 token, once we've
actually added our item. 1 token on the item we added and then 1 token on an item prior to this in
the array. It'll be easier when we look at a particular example.
Play video starting at 1 minute 57 seconds and follow transcript1:57
Let's look at an example we have an empty array. And we're going to start with size 0, capacity 0. We
PushBack(a), what happens? Well we have to allocate our array of size one, point to it, and then we
put a into the array. And now we're going to put a little token on a and this token is what we use to
pay later on to moving a. In this particular example for the very first element there's no other element
to put a token on. So we're just going to waste that other, that third token. We push in b. There's no
space for b so we've got to allocate a larger array and then move a. How are we going to pay for that
moving a? Well with the token the token that's already on it. So we prepaid this moving a. When we
actually initially put a into the array, we put a token on it that would pay for moving it into a new
array. So that's how we pay for moving a and then we update the array, delete the old one, and now
we actually put b in. So we put b in at the cost of one, we still have two more tokens to pay. So we're
going to put one on b and we're going to put one capacity over two that is one element earlier, so
we're going to put one on a. So we've spent three now. One for real and two as deferred payment
that we're going to use later in the form of these tokens.
Play video starting at 3 minutes 19 seconds and follow transcript3:19
Remember these tokens are not actually stored in the data structure. There's nothing actually in the
array. This is just something we're using for mental accounting in order to do our analysis.
Play video starting at 3 minutes 30 seconds and follow transcript3:30
When we push in c, we're going to allocate a new array. We copy over a and we pay for that with our
pre-paid token. We copy over b, paying for that with our pre-paid token. And now we push in c.
Play video starting at 3 minutes 44 seconds and follow transcript3:44
That's one, the second payment we have to make is, we put a token on c and we then we put token
on a. Four divided by two, that is the capacity divided by two, or two elements prior.
Play video starting at 3 minutes 57 seconds and follow transcript3:57
We push in d, we don't have to do any resizing, finally. Okay, so we just put in d and that's the cost of
one. Second, put a token on d. Third, put a token capacity over two or two elements prior to that. So
notice what we've got now is a full array and everything has tokens on it which means when we need
to resize, we have prepaid for all of that movement. So we push in e, allocate a new array. And now
we use those prepaid tokens to pay for moving a, b, c, and d. Get rid of the old array, and now push in
e. And again, put a token on e, and a token on a.
Play video starting at 4 minutes 36 seconds and follow transcript4:36
So, what we've got here then is O(1) amortized cost for each PushBack. And in particular, we have a
cost of three, right?
Play video starting at 4 minutes 45 seconds and follow transcript4:45
So we have clearly seen.
Play video starting at 4 minutes 48 seconds and follow transcript4:48
So lets look back at how we did this.
Play video starting at 4 minutes 52 seconds and follow transcript4:52
For this dynamic array we decided we had to charge three, and other data structures with other
operations we not did have to charge a different amount. We have to figure out what will be
sufficient, in our case three was sufficient, and we decided that we would go ahead and
Play video starting at 5 minutes 9 seconds and follow transcript5:09
store these tokens on the elements that needed to be moved. So it's a very physical way to keep track
of the saved work that we have done, or the prepaid work that we have done. So we charge 3, 1 is the
raw cost of insertion. If we need to resize, we've arranged things such that whenever the array gets
full, we've actually, in order for the array to have been full, we had to have done enough PushBacks
such that every element got a token on it. All the new ones that we added since the previous resize,
plus every time we added one of those new ones, we prepaid for a prior element as well.
Play video starting at 5 minutes 52 seconds and follow transcript5:52
So, we pay our one insertion, we pay one for the element we're adding now and we pay one for sort
of a buddy element earlier.
Play video starting at 6 minutes 4 seconds and follow transcript6:04
In the next video we're going to look at a third way of doing Amortized Analysis, which is the
Physicist's Method.
Now, let's talk about the final way to do amortized analysis, which is the physicist's method. The idea
of the physicist's method is to define a potential function, which is a function that takes a state of a
data structure and maps it to an integer which is its potential.
Play video starting at 17 seconds and follow transcript0:17
This is similar in spirit to what you may have learned in high school physics, the idea of potential
energy. For instance, if you have a ball and you take it up to the top of a hill, you've increased its
potential energy. If you then let the ball roll down the hill, its potential energy decreases and gets
converted into kinetic energy which increases.
Play video starting at 37 seconds and follow transcript0:37
We do the same sort of thing for our data structure, storing in it the potential to do future work.
Play video starting at 43 seconds and follow transcript0:43
Couple of rules about the potential function. First, phi of h sub 0. So, phi is the potential function. h
sub 0 is time 0 of the data structure h, so that means the initial
Play video starting at 57 seconds and follow transcript0:57
state of the data structure, and that has to have a potential of 0.
Play video starting at 1 minute 1 second and follow transcript1:01
Second rule is that potential is never negative. So, at any point in time, phi of h sub t is greater than or
equal to 0. So, once we've defined the potential function, we can then say what amortized cost is. The
amortized cost of an operation t is c sub t, the true cost, plus the change in potential, between, before
doing the operation and after doing the operation. So, before doing the operation, we have phi(h sub
t-1) after we have phi(h sub t), so it's c sub t plus phi(h sub t)- phi(h sub t-1).
Play video starting at 1 minute 42 seconds and follow transcript1:42
What we need to do is choose a function phi, such that,
Play video starting at 1 minute 47 seconds and follow transcript1:47
if the actual cost is small, then we want the potential to increase. So that we're saving up some
potential for doing later work. And if c sub t is large, then we want the potential to decrease. In a way
to sort of pay for that work. So, the cost of in operations is the sum of the true costs which is a
summation from i goes from one to n of c sub i. And, what we want to do is relate the sum of the true
costs to the sum of the amortized costs. So, the sum of the amortized costs is the summation from i
equals 1 to n of the definition of the amortized cost. Which is (c sub i + phi(hsub i) - phi(h sub i-1)).
Play video starting at 2 minutes 31 seconds and follow transcript2:31
Or, we could just rewrite that. So, removing the summation is c sub 1 + phi of (h sub 1)- phi of (h sub
0), + c sub 2 + phi of (h sub 2)- phi of (h sub 1) and so on. What's important to note is that we have a
phi of h sub 1 in the first line and then a minus phi of h sub 1 in the second line, so those two cancel
out. Similarly, we have a phi of h sub 2 in the second line, and we have a phi of h sub 3 when we look
at the amortized cost at time three. And, that goes on and on until at time n-1, we would have a phi of
h sub n-1 positive and a negative phi of h sub n-1 negative. So, if all these cancellations and all we're
left with is the very first term phi of h sub 0, negative phi of h sub 0, and the very last term in the last
line which is phi of h sub n. So, this really just equals phi of h sub n minus phi of h sub 0 because all
the other phis cancel, plus the summation from i equals 1 to n of c sub i, that is the true costs. Since
phi of h sub n is non negative and phi of h sub 0 is 0, this value is greater than or equal to just the
summation of the true costs.
Play video starting at 3 minutes 47 seconds and follow transcript3:47
What that means then is we've come up with a lower bound on the sum of the amortized costs which
is the sum of the true costs. So therefore, if we want to look at a cost of a entire sequence of
operations,
Play video starting at 4 minutes 6 seconds and follow transcript4:06
we know it's at least the sum of the true costs.
Play video starting at 4 minutes 14 seconds and follow transcript4:14
So, let's look at applying this physicist's method to the dynamic array. So, we're going to look at n calls
to PushBack.
Play video starting at 4 minutes 23 seconds and follow transcript4:23
Phi of h, so, at any given time the data structure's going to be two times the size minus the capacity.
Play video starting at 4 minutes 32 seconds and follow transcript4:32
So, as the size increases, the potential's going to be increasing for a given fixed capacity.
Play video starting at 4 minutes 38 seconds and follow transcript4:38
Phi of x sub here, so we want to make sure that our phi function satisfies our requirements. So, first
phi of 0 is 2 x 0- 0, assuming we have an initial array of size 0, and that's just 0. Also, phi of h sub i is 2
x size - capacity. We know that size is at least capacity over 2, so therefore, 2 x size - capacity is
greater than 0.
Play video starting at 5 minutes 3 seconds and follow transcript5:03
Now, let's look at our amortized cost. So, we're going to assume we don't have to do a resize and let's
look at the amortized cost. So, we add a particular element i and the amortized cost is the cost of
insertion plus phi(h sub i) - phi(h sub i-1). So, the cost of insertion is just going to be 1 because we're
adding an element and we don't have to do any moving of elements. Phi of h sub i is 2 x size of i - the
capacity of i, and phi of h sub i- 1 is 2 x size i- 1 - capacity i- 1.
Play video starting at 5 minutes 38 seconds and follow transcript5:38
Well, what do we know? Since we're not resizing and the capacities don't change. So, the capacities
cancel themselves out. And so, we are left with 2 times the difference in sizes. What's the difference
in size? Difference in size is just 1, because we added one element, so this is 1 + 2 x 1 or 3.
Play video starting at 5 minutes 59 seconds and follow transcript5:59
It's no accident that this 3 is the same value that we saw when we used the banker's method.
Play video starting at 6 minutes 7 seconds and follow transcript6:07
And then, let's look at the cost when we have to do a resize. So, we're going to define here k is size
sub i-1, which is the same thing as capacity sub i-1. Why is it the same? because we're about to do a
resize. So, that means that after the previous operation, we must have made the dynamic array full.
And then, phi(h sub i-1) is just 2 times the old size minus the old capacity, and that's just 2 x k - k, or k.
Phi(h sub i) is 2 times the size of i - capacity of i, and that's 2(k + 1), because the size sub i is one more
than the size of i-1, minus 2k. Why 2k? Because we double the capacity each time. So, that's just
equal to 2. So, the amortized cost of adding the element is c sub i + phi(h sub i) - phi(h sub i - 1), which
is just size of i, because that's the number of elements we have to, we have to move size of i-1
elements and then add the one new element, so, that's size of i. So, we have (sizei)+2-k, which is just
(k+1)+2-k, which is 3.
Play video starting at 7 minutes 25 seconds and follow transcript7:25
So, what we have seen now is that the amortized cost using the physicist's method of adding
elements is 3.
Let's go back to the dynamic array.
So are there alternatives to doubling the array size?
Right, we doubled each time.
What happens if we didn't double?
Well we could use some different growth factor.
So for instance, we could use 2.5.
So grow the array by more than two, or grow the array by less than two.
As long as we used some constant multiplicative factor, we'd be fine.
The question is can we use a constant amount?
Can we add by a particular amount, like, let's say, 10 each time?
And the answer is really, no.
And the reason is, as the array got bigger and bigger, and we have to resize
every ten times, we just don't have enough time
to accumulate work in order to actually do the movement.
Let's look at another way.
Let's look at an aggregate method.
Let's say c sub i is the cost of the i'th insertion.
We're going to define that as one, for putting in the i'th element,
plus either i-1 if the i-1'th insertion makes the
dynamic array full.
So that is if i-1 is a multiple of 10 and it's 0 otherwise.
Play video starting at 1 minute 6 seconds and follow transcript1:06
By the definition of aggregate method which is
just the sum of the total costs divided by n and that's n plus again that's
the one summed n times is just n plus the summation from one to (n-1)/10 of 10j.
That is just the multiples of 10.
All the way up to but not including n.
So 10, 20, 30, 40 and so on.
Play video starting at 1 minute 25 seconds and follow transcript1:25
All that divided by n.
Well, we can pull the 10 out of that summation so
it's just 10 x the summation j = 1 to (n- 1)/10 of j.
So that's just numbers 1, 2, 3, 4, and so on, all the way up to (n- 1)/10.
Play video starting at 1 minute 40 seconds and follow transcript1:40
That is O(n squared).
That summation.
So we've got n+10 times O(N^2)/n=O(n^2)/n=O(n).
So this shows that if we use a constant amount to grow the dynamic
array each time that we end up with an amortized cost for push back of O(n)
rather than O(1).
So it's extremely important to use a constant factor.
Play video starting at 2 minutes 4 seconds and follow transcript2:04
So in summary we can calculate the amortized cost
Play video starting at 2 minutes 8 seconds and follow transcript2:08
in the context of a sequence of operations.
Rather than looking at a single operation in its worst case we look at a totality
of a sequence of operations.
We have three ways to do the analysis.
The aggregate method,
where we just do the brute-force sum based on the definition of the amortized cost.
We can use the banker's method where we actually use tokens and
we're saving them conceptually in the data structure.
Or the physicist's method where we define a potential function,
and look at the change in that potential.
Play video starting at 2 minutes 37 seconds and follow transcript2:37
Nothing changes in the code.
We're only doing runtime analysis, so
the code doesn't actually store any tokens at all.
That's an important thing to remember.
Play video starting at 2 minutes 49 seconds and follow transcript2:49
That is dynamic arrays and amortized analysis.
QUIZ • 30 MIN
Slides
Download the slides for dynamic arrays and amortized analysis here:
05_4_dynamic_arrays_and_amortized_analysis.pdfPDF File
References
See the chapter 17 in [CLRS] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford
Stein. Introduction to Algorithms (3rd Edition). MIT Press and McGraw-Hill. 2009.
Additional Video
This external video may be useful to give another perspective on amortized analysis in general,
and the banker's method in particular.
Week 3
Data Structures
Week 3
Discuss and ask questions about Week 3.
Go to forum
Less
Key Concepts
Less
Video: LectureIntroduction
6 min
Resume
. Click to resume
5 min
Reading: Slides
10 min
Video: LectureBinary Trees
1 min
10 min
Video: LectureBasic Operations
12 min
9 min
Video: LecturePseudocode
8 min
10 min
Video: LectureHeap Sort
10 min
Video: LectureBuilding a Heap
10 min
Video: LectureFinal Remarks
4 min
6 questions
10 min
Video: LectureOverview
7 min
Video: LectureNaive Implementations
10 min
10 min
7 min
Video: LectureUnion by Rank
9 min
Video: LecturePath Compression
6 min
Video: LectureAnalysis (Optional)
18 min
4 questions
10 min
Programming Assignment 2
3 questions
2h
Survey
Survey
10 min
Introduction
Hello everybody. Welcome back.
Play video starting at 2 seconds and follow transcript0:02
Today, I'm going to be talking about priority queues.
Play video starting at 7 seconds and follow transcript0:07
This popular data structure has built-in implementations in many programming languages. For
example in C++, Java, and Python. And in this lesson, we will learn what is going on inside these
implementations. We will see beautiful combinatorial ideas that allow to store the contents of a
priority queue in a complete binary tree, which is in turn stored just as an array. This give an
implementation which is both time and space efficient. Also, it can be implemented just in a few lines
of code. A priority queue data structure is a generalization of the standard queue data structure.
Recall that the queue data structure supports the following two main operations. So we have a queue
and when a new element arrives, we put it to the end of this queue by calling the method
PushBack(e). And when we need to process the next element we extract it from the beginning of the
queue by calling the method PopFront(). In the priority queue data structure, there is no such thing as
the beginning or the end of a queue. Instead we have just a bag of elements, but each element is
assigned a priority. When a new element arrives, we just put it inside this bag by calling the method
Insert. However, when we need to process the next element from this bag, we call the method
ExtractMax which is supposed to find an element inside this bag whose priority is currently maximum.
Play video starting at 1 minute 42 seconds and follow transcript1:42
A typical use case for priority queues is the following. Assume that we have a machine and we would
like to use this machine for processing jobs. It takes time to process a job and when we are processing
the current job, a new job may arrive.
Play video starting at 2 minutes 0 seconds and follow transcript2:00
So we would like to be able to quickly perform the following operations. First of all, when a new job
arrives we would like to insert it to the pool of our other weekly jobs quickly, right? And when we are
done with the current job, we would like to be able to quickly find the next job. That is, the job with
the maximum priority.
Play video starting at 2 minutes 24 seconds and follow transcript2:24
Okay, and now we are ready to state the definition of priority queue formally. Formally a priority
queue is an abstract data type which supports the two main operations, Insert and ExtractMax.
Play video starting at 2 minutes 38 seconds and follow transcript2:38
Consider a toy example. We have a priority queue which is initially empty. We then insert element 5
in it, we then insert 7, then insert 1, and then insert 4.
Play video starting at 2 minutes 53 seconds and follow transcript2:53
So we put these elements in random places inside this box on the left, just to emphasize, once again,
that there is no such thing as the beginning or the end of a priority queue. So it is not important how
the elements are stored inside the priority queue. What is important for us now is that if we call
ExractMax() for this priority queue, then an element with currently highest priority should be
extracted. In our toy example it is 7. So if we call ExtractMax for this priority queue, then 7 is taken
out of the priority queue. Then, well let's insert 3 into our priority queue and now let's call
ExtractMax(). The currently highest priority is 5, so we extract 5. Then we ExtractMax() once again,
and now it is 4, okay? Some additional operations that we might expect from a particular
implementation of a priority queue data structure are the following. So first of all, we might want to
remove an element. I mean, not to extract an element with a maximum priority, but to remove a
particular element given by an iterator, for example. Also, we might want to find the maximum
priority without extracting an element with a maximum priority. So GetMax is an operation which is
responsible for this. And also, we might want to change the priority of a given element. I mean, to
increase or to decrease its priority. So ChangePriority(it,p) is the operation responsible for this. Let us
conclude this introductory video by mentioning a few examples of famous algorithms that use priority
queues essentially.
Play video starting at 4 minutes 45 seconds and follow transcript4:45
Dijsktra's algorithm uses priority queues to find efficiently the shortest path from point a to point b on
a map or in a graph.
Play video starting at 4 minutes 55 seconds and follow transcript4:55
Prim's algorithm uses priority queues to find an optimum spanning tree in a graph, this might be
useful for example in the following case. Assume that you have a set of computers and you would like
to connect them in a network by putting wires between some pairs of them. And you would like to
minimize the total price or the total length of all the wires.
Play video starting at 5 minutes 18 seconds and follow transcript5:18
Huffman's algorithm computes an optimum prefix-free encoding of a string or a file. It is used, for
example, in MP3 audio format encoding algorithms.
Play video starting at 5 minutes 31 seconds and follow transcript5:31
Finally, heap sort algorithm uses priority queues to efficiently sort the n given objects. So it is
comparison based algorithm. It's running time is n log n, as in particularly in the worst case. And
another advantage of this algorithm is that it is in place, it uses no extra memory for sorting the input
data. So we will go through all these algorithms in this specialization and the heap sort algorithm will
be even covered in this lesson, in the forthcoming videos.
Naive Implementations of Priority Queues
As usual before going into the details of efficient implementation let's check what is wrong with naive
implementations? For example, what if we store the contents of a priority queue just in an unsorted
array or in an unsorted list? In this example on the slide, we use a doubly linked list. Well, in this case
inserting a new element is very easy. We just append the new element to the end of our array or list.
For example, as follows, if our new element is seven we can just put it to the next available cell in our
array where we can just append it to the end of the list. So we put 7 to the end. We say that the
previous element of 7 is 2 and that there is no next element, right? So it is easy and it takes constant
time, okay? Now, what about extracting the maximum element in this case? Well, unfortunately we
need to scan the whole array to find the maximum element.
Play video starting at 1 minute 6 seconds and follow transcript1:06
And we need to scan the whole list to find the maximum element which gives us a linear running
time. That is we O(n), right? In our previous naive implementation using an unsorted array or list, the
running time of the extract max operation is linear. Well, a reasonable approach to try to improve
this, is to keep the contents of our array, for example array, sorted. Well, what are the advantages of
this approach? Well, of course, in this case, extract max is very easy. So, the maximum element, is just
the last element of our array. Right? Which means that the running time of ExtractMax in this case is
just constant. However, the disadvantage is that now the insertion operation takes linear time, and
this is why. Well, to find the right position for the new element we can use the binary search. This is
actually good, well it can be done in logarithmic time. For example, if we need to insert 7 in our
priority queue, then in logarithmic time we will find out that it should be inserted between 3 and 9 in
this for example. However unfortunately after finding this right position, we need to shift everything
to the right of this position by one.
Play video starting at 2 minutes 29 seconds and follow transcript2:29
Right just to create a vacant position for 7. For this we need to first shift 16 to this cell. Then we move
10 then to this cell, then we move 9 to this cell, and finally we put 7 in to this cell, and we get it sorted
already. So in the worst case, we need to shift a linear number of cells, a linear number of
Play video starting at 2 minutes 54 seconds and follow transcript2:54
elements, which gives us a linear running time for the insertion operation. As we've just seen,
inserting an element into a sorted array is expensive because to insert an element into the middle we
need to shift all elements to the right of this position by one. So, this makes the running time of the
insertion procedure linear. However if we use a doubly linked list, then inserting into the middle of
this list is actually constant time operation. So let's try to use a sorted list. Well, the first advantage is
that the extract max operation still takes constant time. Well this is just because, well, the maximum
element in our list is just the last element, right? So for this reason, we have a constant time for
extract max. Also, another advantage is that inserting in the middle of this list actually takes a
constant amount of work, not linear, and this is why. Again let's try to insert 7 into our list. Well, this
can be done as follows.
Play video starting at 4 minutes 7 seconds and follow transcript4:07
We know that inserting 7 should be done between 3 and 9. So we do just the following. We will
remove this. We remove this pointer and this pointer. We'll say that now the next element after 3 is 7
and the previous element before 7 is 3. And also the next element after 3, after 7 is 9, and previous
element before 9 is 7. So inserting an element just involves changing four pointers. Right? So it is a
constant time operation. However everything is not so easy, unfortunately. And this is because just
finding the right position for inserting this new element takes a linear amount of work. And this in
particular because we cannot use binary search for lists. Given the first element of this list and the last
element of this list, we cannot find the position of the middle element of this list because this is not
an arry. We cannot just compute the middle index in this array. So for this reason, just finding the
right position for the new element I mean to keep the list sorted takes already a linear amount of
work. And for this reason, inserting into a sorted list still takes a linear amount of work.
Play video starting at 5 minutes 28 seconds and follow transcript5:28
Well to conclude if you implement a priority queue using a list or an array sorted or not then one of
the operations insert and extract max takes a linear amount of work. In the next video we will show a
data structure called binary heap which allows to implement a priority queue so that both of these
operations can be performed in logarithmic amount of work.
Slides
06_1_priority_queues_1_intro.pdfPDF File
In the previous module, we defined the height of a tree as the number of nodes on a longest path
from the root to a leaf. In this module, we use a slightly different definition of the height: we define
it to be equal to the number of edges on the longest path from the root to a leaf. In particular, the
height of a tree that consists of one node is equal to 0 and the height of tree shown below is equal
to 3.
Both definitions of height are used in practice frequently, so it is always a matter of context. If
there is no definition in the text, you should look at the examples discussed to understand which of
the two definitions is used.
Basic Operations
Let's see how basic operations work with binary max heaps.
Play video starting at 5 seconds and follow transcript0:05
What is particularly easy for binary max heaps is finding the maximum value without extracting it. I
mean, it is easy to implement GetMax operation. Well, recall that the main property of that binary
max heap tree is the following. For each edge its top value is greater or equals than its bottom value.
this means that if we go from bottom to top now in our trees, the values can only increase. This in
particular means that the maximum value is stored at the root of our tree. So just to implement
GetMax, we just return the value at the root of our tree. And this takes us just a constant time of
course. Now let see how inserting a new element into to the max binary heap works. So first of all a
new element should be attached somewhere to our tree. We cannot attach it to the root in this case
for example, because the root already has two children. Therefore, we just attach it to some leaf. Let's
select for example, the leaf seven and attach a new node to it. The new node in this case has value 32.
Well, it is still a binary tree. Right? Because seven, before attaching seven, had zero children, now it
has just one child. So it is still a binary tree. However, the heap property might potentially be violated.
And it is violated actually in this case, right? Which is shown by this red edge. So for this red edge the
value of it parent which is seven, is less than the value of its child which is 32. So we need to fix it
somehow. So to fix it we just allow that the new element to sift up. So this new element has value 32,
which is relatively large with respect to all other elements in this tree, so we need to move it
somewhere closer to the root. So the process of moving it closer to the roof is called sifting up.
Play video starting at 2 minutes 25 seconds and follow transcript2:25
So the first thing to do is we need to fix this problematic edge. To fix it, we perform the following
simple operation. We just swap the corresponding two elements. In this case, we'll swap seven and
32. After they swap, there is no problem on this edge. However, it might be the case that the new
element 32 is still smaller. Is still greater than its parent and this is the case, in our toy example. So the
parent of 32 is now 29, which is smaller than 32, so we still need to fix this red problem. And we just
repeat this process, we again swap the new element with its parent, right? So we swap it and now we
see that the property is satisfied for all edges in this binary tree.
Play video starting at 3 minutes 27 seconds and follow transcript3:27
So what we've just done is that we let the new element to sift up.
Play video starting at 3 minutes 33 seconds and follow transcript3:33
And what is important to note here is that we maintained the following invariant, that the heap
property at any point of time of sifting the new element up, the heap property is violated on at most
one edge of our binary tree. So and if we see that there is a problematic edge, we just swap its two
elements, right? And each time during this process the problematic node gets closer to the root. This
in particular implies that the number of swaps required is at most the height of this tree. Which in
turn means that the running time of insertion procedure, as well as the running time of the sifting up
procedure, in this case is big O of the tree height.
Play video starting at 4 minutes 24 seconds and follow transcript4:24
Now let's see how the extract max procedure works for binary max heaps. First of all, recall that we
already know that the maximum value is stored at the root of the tree. However, we cannot just take
and detach the root node because it will leave two sub trees, right? So we need to somehow preserve
the structure of the tree. What is easy to detach from a binary tree is any leaf. So let's do the
following, let's select any leaf of our tree and let's replace the root with this leaf. So in this case this
produces the following tree.
Play video starting at 5 minutes 4 seconds and follow transcript5:04
This potentially might violate the heap property. And in this case, this does violate the property. So
the new root 12, is less than both its children. So the property is violated on two edges. So 12 is a
relatively small number in this case. So we need to move it down to the leaves. Great, so for this we
will implement a new procedure, which is called SiftDown, okay? So, similarly to SiftUp, we are going
to replace,
Play video starting at 5 minutes 41 seconds and follow transcript5:41
to replace the new element with one of its children. In this case we have a choice actually, we can
replace it either with its left child or with its right child. By thinking a little bit we realize that it will
make more sense to replace it with the left child in this case. Because the left child is larger than the
right child, because after this, after we replace 12 with 29, the right problematic edge will be fixed
automatically, right? So this is how we are going to perform the SiftDown procedure. Once again, we
select the largest of two child and we replace. the problematic node with this larger child. As you can
see, the right problematic edge is fixed automatically. The left edge is also fixed, just because we
swapped two elements. However, the new problematic node might introduce new problems, right
closer to the bottom of the tree. Now we see that there is still a problematic edge, so in this case, we
have just one edge so 12 is smaller than 14, but it is greater than seven, so we are safe in the right
tree. In this case we swap 14 with 12 and after that we just get a tree where the property is satisfied
on all edges. So once again we maintain the following invariant. At each point of time we have just
one problematic node, and we always solve the problematic node. With the larger one of its children,
so that to fix both problematic edges. Right? And the problematic node always gets closer to the leaf,
which means that the total running time of the extract max as well as the sift down procedures is
proportional to the tree height.
Play video starting at 7 minutes 46 seconds and follow transcript7:46
Now, when we have implemented both procedures, sifting up and sifting down, it's not so difficult to
implement also the ChangePriority procedure. So assume that we have an element for which we
would like to change its priority. This means that we are going either to decrease its priority or
increase its priority. Well, to fix the potential problems that might be introduced by changing its
priority, we are going to call either sifting up or sifting down.
Play video starting at 8 minutes 16 seconds and follow transcript8:16
Well, let me illustrate this again on the toy example. Assume that we are going to change the priority
of this leaf 12. So we've just changed it. We just increased the priority of this element to 35. In this
case, we potentially introduced some problems and we need to fix some.
Play video starting at 8 minutes 36 seconds and follow transcript8:36
Well we see that 35 is a relatively large number which means that we need to sift it up. So we need to
move it closer to the root. So to do this we just call SiftUp procedure. Which repeatedly swaps the
problematic node with its parent, so in this case this will produce the following sequence of swaps.
Play video starting at 9 minutes 0 seconds and follow transcript9:00
First will swap 35 with 18 this gives us the following picture, we see there is still a problem 35 is still
larger than its parent so we swap it again. Now we see that 35 is smaller than its parent. And actually,
the heap property is satisfied for all edges. Once again, what is important in this case is that at each
point of time, the heap property is violated on at most one edge of our tree. So since our problematic
node always gets closer to the root at each step, I mean, after each swap. We conclude that the
running time of change priority procedure is also at most Big O of the tree height. There is an elegant
way of removing an element from the binary max heap. Namely it can be done just by calling two
procedures that we already have. So I assume that we have a particular element that we're going to
remove.
Play video starting at 10 minutes 1 second and follow transcript10:01
So the first step to do is we just change its priority to plus infinity, that is, to a number which is
definitely larger than all the elements in our binary MaxHeap. When we call it, the change priority
procedure will sift this element to the top of our tree, namely to the root of our tree. Then to remove
this element it is enough to call the extract max procedure. So in this particular example it will work as
follows. So assume that we're going to remove the element 18, which is highlighted here on this slide.
So we first change it's priority to infinity. Then the ChangePriority procedure calls the SiftUp
procedure. This procedure realizes that there is, that the property is violated on this edge. And swaps
these two elements. Then it swaps the next two elements and each at this point well this,
Play video starting at 11 minutes 4 seconds and follow transcript11:04
this node that we're going to remove is at the root. Well, to remove this node, we just call the
ExtractMax procedure. So recall that the first step of ExtractMax is to replace the root node with any
leaf. So let's select, for example, 11. So we replace, we replace the root with 11. Then we need to call
sift down, just to let this new root go closer to the leaves.
Play video starting at 11 minutes 39 seconds and follow transcript11:39
Well, in this case, 11 will be replaced first by 42, then there is still a problem on the edge from 11 to,
to 18. So we swap 11 with 18 and finally we swap 11 with 12. Well, once again since everything boils
down just to two procedures. First is change priority. And the second one is extracting the max. And
they all, they both work in time proportional to the tree height. So we conclude that the running time
of the remove procedure is also, at most, Big O of the tree height. So to summarize, we were able to
implement all max binary heap operations in time proportional to the tree height, and the GetMax
procedure even works in constant time in our current implementation. So we definitely would like to
keep our trees shallow. And this will be the subject of our next video.
The first advantage of complete binary trees is straightforward, and it is exactly what we need
actually. Namely, the height of any complete binary tree with n nodes is O(log n). Intuitively, this is
clear. A complete binary tree with n nodes has the minimum possible height over all binary trees with
n nodes. Well, just because all the levels of this tree, except possibly the last one, are fully packed.
Still let me give you a formal proof.
Well, for this consider our complete binary tree and let me show a small example. So I assume that
this is our complete binary tree. So in this case, n = 10 and the number of levels, l = 4.
Play video starting at 2 minutes 12 seconds and follow transcript2:12
Well, let's first do the following thing, let's complete the last level.
Play video starting at 2 minutes 19 seconds and follow transcript2:19
And let's denote the result in number of nodes by n prime. In this case in particular the number of
nodes
Play video starting at 2 minutes 30 seconds and follow transcript2:30
in the new tree is equal to 15. Well the first thing to note is that n prime is at most 2n. Well this is just
because in such a tree where all levels including the last one are fully packed, the number of nodes on
each level is equal to the number of nodes on all the previous levels minus one. Okay, so for example
here the number of nodes on the last level is 8, and the number of nodes on all previous levels is 7. So
we added at most seven vertices.
Play video starting at 3 minutes 9 seconds and follow transcript3:09
Now, when we have such a tree where all the levels are packed completely, it is easy to relate the
number of levels with the number of vertices. Namely with the number of nodes. Namely n prime = to
2 to the l- 1. This allows us to conclude that l = binary logarithm of n prime + 1. Now, recall that n
prime is at most 2n, which allows us to write that l is at most binary logarithm of 2n + 1 which is of
course, O(log n).
Play video starting at 3 minutes 42 seconds and follow transcript3:42
The second advantage of complete binary trees is not so straightforward, but fortunately it is still easy
to describe. To explain it, let's consider again a toy example, I mean a complete binary tree shown
here on this slide. Let's enumerate all its nodes going from top to down, and on each level from left to
right.
Play video starting at 4 minutes 4 seconds and follow transcript4:04
So this way the root receives number 1, to its children receive numbers 2 and 3 and so on. So it turns
out that such a numbering allows for each vertex, number i for example, to be specific, to compute
the number of its parent and the numbers of each children using the following simple formulas. Once
again, if we have a node number i, then its parent has number i divided by 2 and rounded down while
its two children have numbers 2i and 2i + 1. To give a specific example, I assume that i = 4, which
means that we are speaking about about this node. Then to find out the number of its parent, we
need to divide i by 2, this gives us 2.
Play video starting at 4 minutes 59 seconds and follow transcript4:59
And indeed, vertex number 2 is a parent of vertex number 4. While to find out the numbers of two
children of this node, we need to multiply i by 2, this gives us this node and multiply i by 2 + 1 and this
gives us this node. And these two nodes are indeed children of vertex number 4, right? And this is
very convenient. This allows us to store the whole complete binary tree just in an array. So we do not
need to store any links for each vertex to its parent and to its two children. So these links can be
computed just on the fly. Again to give a concrete example, assume that we are talking about vertex
number 3. So in this case i = 3. To find out the number of its parent, we just divide i by 2 and round
down. So this gives us vertex number 1, and indeed vertex number 1 is a parent of the vertex number
3.
Play video starting at 6 minutes 10 seconds and follow transcript6:10
And to find out the numbers of its two children, we just multiply i by 2 and also multiply i by 2 and
add 1. This gives us, in this case, vertices number 6 and vertex number 7. So, and we know its indices
in theory.
Okay, we have just discussed two advantages of complete binary trees, and it would be too optimistic
to expect that these advantages come to us at no cost. So we need to pay something, and our cost is
that we need to keep the tree complete. Right, we need to ensure that at each point of time, our
binary tree is complete. Well, to ensure this, let's just ask ourselves what operations change the shape
of our tree. Essentially, these are only two operations, namely insert and extract max. So, two
operations, sift up and sift down, they actually do not change the shape of the tree. They just swap
some two elements inside the tree. Another operation which actually change the shape is remove an
element, however, it does so by calling the ExtractMax procedure.
Play video starting at 7 minutes 26 seconds and follow transcript7:26
So on the next slide we will explain how to modify our Insert and ExtractMax operations so that they
preserve completeness of our tree.
Return for PPT Slides
In this video, we provide the full pseudocode of the binary max heap data structure.
Play video starting at 6 seconds and follow transcript0:06
Here we will maintain the following three variables. H is an array where our heap will stay. MaxSize is
the size of this array, and at the same time, it is the maximum number of nodes in our heap. And size
is the actual size of our heap. So size is always at most maxSize.
So let me give you an example. In this case, we're given a heap of size 9. And it is stored in the first
nine cells of our array H whose size is 13. In particular, you may notice that there are some values
here, and it is actually some garbage. We just don't care about any values that stay to the right of the
position numbered 9. So our heap occupies the first nine positions in the array. Also let me emphasis
once again that we store just the array H and also variables size and maxSize. So this tree is given to
us implicitly. Namely, for any node, we can compute the number of its parent and the number of its
two children. And we can compute it and access the corresponding value in this array. For example, if
we have no number three, then we can compute the index of its left child, which is 2 multiplied by 3.
So the value of its right child is 18. These implementations showing how to find given a node i, the
index of the parent of i and two children of i. So they just implement our formulas in a straightforward
way.
Play video starting at 1 minute 53 seconds and follow transcript1:53
To sift element i up, we do the following. While this element is not the root, namely,
Play video starting at 2 minutes 2 seconds and follow transcript2:02
i is greater than 1, and while the value of this node is greater than the value of its parent, we do the
following. We swap this element with his parent. So this is done on this line. And then we proceed
with this new element. I mean, we assign i to be equal to Parent of i and go back to this while loop,
and we do this until the property is satisfied.
To insert a new element with priority p in our binary max heap, we do the following. We first check
whether we still have room for a new element, namely whether size is equal to maxSize. If it is equal,
then we just return an error. Otherwise, we do the following. We increase size by 1, then we assign H
of size to be equal to p. At this point we add a new leaf in our implicit tree to the last level, to the
leftmost position on the last level. And finally, we call SiftUp to sift this element up if needed.
To extract the maximum value from our binary max heap, we first store the value of the root of our
tree in the variable result. So result is assigned to be equal H of 1. Then we replace the root by the last
leaf, by the rightmost leaf on the last level, so this is done by assigning H of 1 to be equal to H of size.
Okay? Then we decrease the value of size by 1, just to show that the last leaf is not in our tree
anymore. And finally, we call SiftDown for the root, because it was replaced by
Play video starting at 5 minutes 59 seconds and follow transcript5:59
the last leaf, which is potentially quite small and needs to be sifted down.
Play video starting at 6 minutes 6 seconds and follow transcript6:06
And the last instruction in our pseudocode is, we return the result. That means the value which was
initially in the root of our tree.
Removing an element. So as we've discussed already, this actually boils down to calling two
procedures that we already have. Once again, to remove element number i, we do the following.
First, we change its priority to be equal to infinity, so we assign H of i plus infinity. Then we SiftUp this
node, so this will move this node to the root of our tree. And then we just call ExtractMax()
procedure, which will remove the root from this tree and make necessary changes in the tree to
restore its shape.
Play video starting at 6 minutes 52 seconds and follow transcript6:52
Finally, to change the priority of a given node i to the given value p, we do the following. We first
assign H of i to p, okay? Then we check whether the new priority increased is greater than the old
priority or is smaller. If it is greater, then potentially we need to sift up the new node. So we just call
sift up. If the new priority is smaller, then we call SiftDown for this node.
Slides
Download the slides for this lesson:
References
See the chapter 6 in [CLRS] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford
Stein. Introduction to Algorithms (3rd Edition). MIT Press and McGraw-Hill. 2009.
>> In this video we will use binary heaps to design the heap sort algorithm, which is a fast and space
efficient sorting algorithm. In fact, using priority queues, or using binary heaps, it is not so difficult to
come up with an algorithm that sorts the given array in time of analog n. Indeed, given a rate a of size
m, we do the following. First, just create an empty priority queue. Then, insert all the elements from
our array into our priority queue. Then, extract the maximum one by one from the given array.
Namely, we first extract the maximum. This is the maximum of our array, so put it to the last position.
Then, extract the next maximum and put it to the left of the last position, and so on. This clearly gives
us a sorted array, right?
So, we know that if we use binary heaps as an implementation for priority queue, then all operations
work in logarithmic time. So, this gives us an algorithm with a running time big o of n log n. And recall
that this is asymptotically optimal for algorithms that are comparison-based. And this algorithm is
clearly comparison-based, all right? Also, know that this is a natural generalization of selection sort
algorithm. Recall that in selection sort algorithms, we proceed as follows. Given an array, we first scan
the whole array to find the maximum value. So, then we get this maximum value and swap it with the
last element. Then, we forget about this last element and we can see that only n-1 first elements.
Again, by scanning this array, we find the maximum value, and we swap it with the last element in this
region, and so on. So, here in the heap sort algorithm, instead of finding the maximum value at each
iteration, namely, we use a smart data structure, namely a binary heap, right? So, the only
disadvantage of our current algorithm is that it uses additional space. So, it uses additional space to
store the priority queue.
Okay? So, in this lesson we will show how to avoid this disadvantage. Namely, given an array A, we
will first permute its elements somehow, so that the result in array is actually a binary heap. So, it
satisfies binary heap property. And then, we will sort this array, again, just by calling extract marks
and minus one times.
So, we have n log n running time. We already have everything to present, to present the in-place heap
sort algorithm. Given an array A of size m, we first build heap out of it. Namely, we permute its
elements so that the resulting array corresponds to a complete binary tree, which satisfies the heap
property on every edge. So, we do this just by calling BuildHeap procedure. In particular, this
BuildHeap procedure assigns the value n to the variable size. Then, we repeat the following process n
minus 1 times. So, we first, we call that just after calling to build heap, the first element of our array is
a maximum element. Right? So, we would like to put it to the last position of our array. So, we just
swap A1 with A of n. And currently, A of n is equal to n of size, okay? So, we swap it. And then, we
forget about the last element. So, we decrease size by one. So, we say that now our heap occupies
the first n-1 element. And since we swapped the last element with the first element, we potentially
need to sift down the first element. So, we just call SiftDown for the element number one. And we
proceed in a similar fashion. I mean, now the heap occupies n-1, the first n-1 position. So, the largest
element among the first n-1 element is the first element. So, we swap it with element n-1. We forget
about the element n-1 by reducing the size by 1. And then, see if bounds a first element. Okay? So, we
repeat this procedure n-1 times, each time finding the currently largest element. So, once again, this
is an improvement of a selection sort algorithm. And this is an in-place algorithm.
Play video starting at 9 minutes 17 seconds and follow transcript9:17
So, once again, let me state some properties of the resulting algorithm which is called Heap Sort. It is
in place. it doesn't need any additional memory. Everything is happening inside the given array A. So
this is in advantage of this algorithm. Another advantage is that its learning times is n log n. It is as
simple, as it is optimal. So, this makes it a good alternative to the quick sort algorithm. So, in practice
presort is usually faster, it is still faster. However, the heap sort algorithm has worst case running time
n log n. While the quick sort algorithm has average case running time n log n. For this reason, a
popular approach and practice is the following. It is called IntraSort algorithm. You first run quick sort
algorithm. If it turns out the be slow, I mean, if the recursion dips, exceeds c log n for some constant,
c, then you stop the current call to quick sort algorithm and switch to heap sort algorithm, which is
guaranteed to have running time n log n. So, in this case, in this implementation, your algorithm
usually, in most cases it works like quick sort algorithm. And even in these unfortunate cases where it
works in larger, where quick sort has running time larger than n log n, you stop it in the right point of
time and switch to HeapSort. So, this gives an algorithm which in many cases behaves like quick sort
algorithm, and it has worst case running time. That must be [INAUDIBLE] n log n.
Building a Heap
In this video, we are going to refine our analysis of the BuildHeap procedure. Recall that we estimated
the running time of the BuildHeap procedure as n log n, because it consists actually of roughly n over
2 calls to SiftDown procedure, whose running time is log n. So we get n and over 2 multiplied by log n,
which is of course O(n log n). Note, however, the following thing. If we call SiftDown for a node which
is already quite close to the leaves, then the running time of sifting it down is much less than log n.
Right? Because it is already close to the root. So the number of swaps until it goes to the leaves
cannot be larger than the height of the corresponding subtree, okay? Note also the following thing.
We actually, in our tree, we actually have many nodes that are close to the root. So we have just one
node, which is exactly the root, whose height is log n. We have two nodes whose height is log n minus
1, we have four nodes whose height is log n minus 2, and so on. And we have roughly n over 4 nodes
whose height is just 1. Okay? So it raises the question whether our estimate of the running time of
BuildHeap procedure was too pessimistic.
We will see on the next slide. Let's just estimate the running time of the BuildHeap procedure a little
bit more accurately. Okay, so this is our heap, shown schematically.
Play video starting at 1 minute 44 seconds and follow transcript1:44
So this is the last level, which is probably not completely filled.
Play video starting at 1 minute 49 seconds and follow transcript1:49
But all the leaves on the last level are in leftmost position. So, on the very top level, we have just one
node, and sifting down this node costs logarithmic time. At the same time, on the last level, we have
at most n over 2 nodes, and sifting them down makes at most one swap. Actually, we do not need
even one swap, just zero swaps, but let's be just generous enough and let's allow one swap. On the
next level, we have at most n over 4 nodes, and sifting down for them costs at most two swaps, and
so on. So if we just compute the sum of everything, so we have n over 2 nodes, for which the cost of
the SiftDown procedure is 1. We have n over 4 nodes on the next level, for which sifting them down
makes at most two swaps. On the next level, we have n over 8.
Play video starting at 2 minutes 57 seconds and follow transcript2:57
Now it's, for each, sifting them down costs at most three swaps, and so on. So now let's do the
following. Let's just upper bound this sum by the following sum. First of all, let's take the multiplier n
out of this sum. So what is left here is the following, 1 over 2 + 2 over 4 + 3 over 8 + 4 over 16 + 5 over
32, and so on, right? So this can be upper-bounded by the following sum. So this is just the sum from i
equal to 1 to infinity of the following fraction, i divided by 2 to the i. Once again, in our case, in the
running time of BuildHeap, this sum is finite.
Play video starting at 3 minutes 51 seconds and follow transcript3:51
So the maximum value of i is log n. We do not have any nodes on height larger than log n. But we just
upper-bound it by an infinite sum, where we can see they're just all possible values of i. And even for
this sum, we will show that the value of this sum is equal to 2. Which gives us that the running time of
the BuildHeap procedure is actually at most 2n.
Slides
Download the slides for this lesson:
Overview
Hello and welcome to the next lesson of the data structures class. It is devoted to disjoint sets.
Play video starting at 7 seconds and follow transcript0:07
As a first motivating example, consider the following maze shown on the slide. It is basically just a grid
of cells with walls between some pairs of adjacent cells. A natural question for such a maze is given
two points, given two cells in this maze whether there is a path between them. For example for these
two points for these two cells shown on these slides there is a path and it is not too difficult to
construct it. Let's do this together. So this is, we can go as follows.
Play video starting at 49 seconds and follow transcript0:49
And there is actually another path here, we can also go this way.
Play video starting at 59 seconds and follow transcript0:59
Great. On the other hand, there is no path between these two points shown on the slide and to show
this we might want to construct just a set of all points that are reachable from B. Let's again do this,
so let's just mark all the points that are reachable from B.
Play video starting at 1 minute 23 seconds and follow transcript1:23
So it is not difficult to see that we marked
Play video starting at 1 minute 26 seconds and follow transcript1:26
just every single point which is reachable from B. And we now see that A does not belong to this set.
Which actually justifies that A is not reachable from B in this case.
Slides
Download the slides for this lesson:
06_2_disjoint_sets_1_naive.pdfPDF File
References
See the chapters 21.1 and 21.2 in [CLRS] Thomas H. Cormen, Charles E. Leiserson, Ronald L.
Rivest, Clifford Stein. Introduction to Algorithms (3rd Edition). MIT Press and McGraw-Hill. 2009.
Hi. In the previous video we considered a few naive implementations of the disjoint sets data
structure. In one of them, we represented each set as a linked list. Let me give you a small example.
Play video starting at 17 seconds and follow transcript0:17
So these four elements are organised into a linked list. And we treat the tail of this element of this list,
so this last element has the idea of the correspondence set. And this is well defined idea because it is
unique for any list, and it can be easily reached from any other element in the correspondence set. So
if we need to find the ID of the set that contains this element,
Play video starting at 45 seconds and follow transcript0:45
we just follow the next pointers shown here until we reach the tail of this list. Another advantage is
that merging two sets is very easy in this case. So assume that this is our first set and the second set
looks as follows.
Play video starting at 1 minute 3 seconds and follow transcript1:03
Then, to merge these two sets, we just append one of the least to the other one. Like this.
Play video starting at 1 minute 11 seconds and follow transcript1:11
The first advantage of this merging is that it is clearly constant time. We just change one pointer.
Another advantage is that it updates the ID of the result in that list automatically. Now, with three
these elements as ID of the result in list. It still can be reached for many as an element of this list, just
by following these pointers. The main disadvantage of this approach is that over time, lists get longer
and longer. Which in turn, implies that the find declaration gets slower and slower.
The general setting is the following. Each set is going to be represented as a rooted tree. We will tree
the root of each tree as the idea of the corresponding cell. For each element, we will need to know its
parent and this will be stored in the array parent of size m. Namely parent of i will be equal to j if
element j is a parent of i or in case i is a root. Then, parent of i will be equal to i. So this is a toy
example. Here, we have three trees and there are three roots, five, six, and four. And these three
trees are stored in their right parent as follows, for example, to indicate that four is the root, we store
four in the fourth cell of this array. To indicate that 9 is the parent of 7, we'll put 9 into the 7th cell.
Recall that MakeSet of i creates a single [INAUDIBLE] set consistent of just a single element i. To do
this, we just assign parent of i to be equal to i. This creates the three, whose only element is right and
this is the root of these three. So for this reason, we assign parent to five be equal to i. The running
time of this operation is, of course, constant.
Play video starting at 4 minutes 21 seconds and follow transcript4:21
To find the root of the three that contains a given element, i, we just follow the parent links from the
node i until we reach the root. This can be done as follows. While i is not equal to parent[i], namely
while i is not the root of the corresponding tree,
we replace i by its parent. So each time, we go close it, there's a route. And eventually, we will reach
a route. And then this point, we return the results in element. The running time of this operation is, of
course, at most, the height of the correspondent tree.
Now, we need to design a way of nurturing to a trees and there is a very natural idea for doing this.
We have two trees, let's just take one of them and camp under the root of the other one. Let me
illustrate it with a small example.
Play video starting at 5 minutes 14 seconds and follow transcript5:14
Assume that this is our first tree. It contains just three nodes and this is the root of this tree so it
points to itself, and this is our second tree.
Play video starting at 5 minutes 28 seconds and follow transcript5:28
This is the root, so it points at itself again. To merge these two trees we just change one pointer.
Namely, we say that now.
Play video starting at 5 minutes 39 seconds and follow transcript5:39
This node is not the root anymore but its parent is this node. So we hang the left tree on this root of
the right tree.
Play video starting at 5 minutes 50 seconds and follow transcript5:50
Once again this node is not the root anymore, while this node is the root of the resulting tree. And at
this point, there is a natural question. We can hang the left tree on the root of the right tree. But also,
vice versa, we can hang the right tree under the root of the left tree. So which one to choose? And
after thinking a little bit, we realize that it makes sense to hang a tree whose height is smaller under
the root of the true whose height is larger. And the reason for this is that we would like to keep our
trees shallow. And in turn the reason for this is that the height of the trees in our forest influences the
running time of the find operation. Namely, the worst case running time of the find operation is
actually at most the maximal height of a tree in our forest.
To give a specific example, let's consider the following through trees shown on the slide. In this case,
we have a tree of height one and tree of height two. Assume that we call Union of 3 and 8, in this case
we need to merge these 2 trees and these will discuss there are 2 possibilities for doing this. Either we
hang the left tree under the root of the right tree or vice versa, we hang the right tree under the root
of the left tree. The results of these two cases are shown here on the slide, and you see that in the
last case the height of the tree increased. And this is not something that we want, because as we've
discussed the height of this tree influences the worst case running time of the find operation. So this
illustrates that to keep our trees shallow, when merged into a trees we would like to hang a tree
who's height is smaller. And there's a root of the tree whose height is larger.
Union by Rank
Okay, when merging two trees we're going to hang the shorter one under the root of a taller one. This
means that when merging two trees we need a way to quickly find the height of both trees. Instead of
just computing them we're going to keep the height of each possible subtree in our forest in a
separate array called rank. Name the rank of i is equal to the height of the subtree rooted at i. The
reason we call it rank will become clear a little bit later.
Play video starting at 36 seconds and follow transcript0:36
Let me also mention that this way of merging two trees, based on the height is called the union by
rank heuristic.
To keep the rank, we need a small addition to our MakeSet implementation, namely when creating a
single set, we also set rank of i to be equal to zero. This reflects the fact that it is currently just a root
containing one node, that just a tree containing one node, that is a tree of height zero.
Play video starting at 1 minute 7 seconds and follow transcript1:07
We do not need to change Find. So the Find operation doesn't need to change rank, and it also
doesn't use rank in any way. To merge two trees containing the given two elements i and j, we do the
following. We first find the roots of the point in two trees by calling the Find operation two times.
Play video starting at 1 minute 28 seconds and follow transcript1:28
We store this root in variables i_id and j_id. We then check whether i_id is equal to j_id. If they are
equal, this means that elements i and j already lie in the same set. So we just return in this case. So
this is done in the following if loop. We then check whether the height of the tree containing element
i is larger than the height of the tree containing element j. If it is larger, then we hang the tree with
the root j_id, and this root of element i_id. This is done as follows. Parent of j_id is set to i_id.
Otherwise, we do the opposite thing. We just assigned parent of i_id to be equal to j_id.
Play video starting at 2 minutes 18 seconds and follow transcript2:18
So the last thing is that we need to check whether the height of the corresponding two threes are just
equal. Let me illustrate this again with a small example. Assume that we are merging the following
two trees.
Play video starting at 2 minutes 36 seconds and follow transcript2:36
In this case the height of these two elements and this element is zero and this height of this element
is 1. So in this case roots are equal. The ranks of the corresponding roots are equal. To merge these
two trees we do the following.
Play video starting at 3 minutes 0 seconds and follow transcript3:00
We just hang the left tree under the root of the right tree. If you can see, in this case, the height of
the resulting tree actually increases and this is the only case when the union operation increases the
height of this tree. So in this case, initially, the longest path contained just one edge. In this case we
go the path that can contain two edges. So we need to update this rank and this is done in the last
check. So if initially the ranks of our two trees that are going to be merged are equal we hang one of
them under the root of the other one and increase the rank of the resulting tree by one.
Return PPT Slides
Let's consider a small example, in this case we have six elements. Let's call MakeSet for each of these
elements. These fields have a data structure as follows. So currently, each element is its own parent,
right? So its current set is just a single one set. Also, the height of each sub-tree in our data structure
is currently equal to 0. Now let's call Union(2,4). In this case, the rank of the subtree rooted at 2 is
equal to 0. The height of the subtree rooted at 4 is equal to 0. So it doesn't mean which one to hang
under the root of the other one, so let's hang 2 under 4. This changes the data structure as follows.
Now it's a parent of 2 is 4 and the rank of the subtree rooted at 4 is equal to 1. Okay, now let's call
Union(5,2). In this case the height of the tree that contains the element number 2 is equal to 1, right?
While the height of the tree that contains element number 5 is equal to 0. So, in this case we're going
to hang 5 under 4. We do this as follows. So, this change the data structure, only this changes only
this cell. So now 4 is the parent of 5, and it doesn't change any rank in our sub tree, in our forest.
Play video starting at 5 minutes 24 seconds and follow transcript5:24
Okay, now lets call Union(3,1). This is done as follows, now 1 is rank 1, and now the parent of 3 is
equal to 1, okay? Now, let's call Union(2,3), and again, in this case, 2 lies in a set in the tree whose
root is 4. And currently, the rank of 4 is equal to 1. Also, 3 lies in a set whose root is 1. And currently,
rank of 1 is equal to 1. Which means that after merging these two trees we will get a tree of height 2.
So we do this as follows, now 1 is the root of the resulting tree and its rank is equal to 2.
Play video starting at 6 minutes 8 seconds and follow transcript6:08
Finally we call Union(2,6) and this will just attach 6 to 1, as follows.
Play video starting at 6 minutes 15 seconds and follow transcript6:15
In our current implementation, we maintain the following important invariant. At any point of time
and for any node i, rank of i is equal to the height of the subtree rooted at this node, i, right?
Play video starting at 6 minutes 32 seconds and follow transcript6:32
We will use this invariant to prove the following lemma. The height of any tree in our forest is at most
binary logarithm of n.
Play video starting at 6 minutes 42 seconds and follow transcript6:42
This will immediately imply that the running time of all operations with our data structure is at most
logarithmic, right? To prove this lemma we will prove another lemma shown here on this slide.
Play video starting at 6 minutes 58 seconds and follow transcript6:58
We're going to prove that if we have a tree in our forest whose height is k then this tree contains at
least two to the k nodes.
Play video starting at 7 minutes 8 seconds and follow transcript7:08
This will imply the first lemma as follows. I assume that some tree has height with more, strictly
greater than binany logarithm of n. Using the second lemma it will be possible to show then that this
tree contains more than n nodes, right? Which would lead to a contradiction with the fact that we
only have n objects in our data structure.
Play video starting at 7 minutes 35 seconds and follow transcript7:35
Here we are going to prove the second lemma by induction on k. Recall that we proved that any tree
of height k in our forest contains at least 2 to the k nodes. We're going to prove this by induction on k.
When k is equal to zero this means that we have a tree just of height 0, which means that it contains
just one node. So, in this case, the statement clearly holds. Now, to prove the induction step, let's
recall that the only way to get a height, to get a tree of height k, is to merge two trees, whose height
is equal to k- 1. I mean to merge both trees such that the height of the first tree is equal to k- 1 and
the height of the second tree is equal to k-1. By the induction hypothesis the both of these two trees
contain at least 2 to the k-1 node. Which means that our resulting tree contains at least 2 to the k- 1 +
2 to the k- 1 nodes, which is exactly equal to 2 to the k, right? Which means that the lemma is proved.
To conclude, the running time of both Union and Find operations in our current implementation is at
most logarithmic. Why is so? Well, just because we keep our trees shallow, so that the height of any
tree in our forest is at most logarithmic. This immediately implies the time of any Find operation is
also big O of logarithm of n. Recall also that the Union operation consists of two calls to the Find
operation and also a few constant time operations, which means also that the running time of Union
is also big O of log n. In the next video, we will see another beautiful heuristic which will just decrease
the running time of both these operations to nearly a constant.
Path Compression
Now it remains to estimate the last term. Where we account for all the address traversed during m
calls to the find operations. Where we go from node i to node j through its parent j such that j is not
the root, first of all. And then the rank, the log star of rank of i is equal to log star of n of j. What we're
going to show is that the total number of side edges is upper bounded by big O of m multiplied by log
star of m. Note that this is even better than what we need. What we need is a recap upper bound
which is m log star of n. Recall that we know that m is greater than m just because m is the total
number of operations, while n is the number of calls to make said operations.
To estimate the required term, consider a particular node i and assume for completeness that it's
rank lies in an interval from k plus one to two to the k. Recall that this was the form of interval
through which the lock star function is fixed. Okay?
Play video starting at 13 minutes 37 seconds and follow transcript13:37
Now let's compute the total number of nodes whose rank lies in section interval. So we know that the
total number of nodes whose rank is equal to k plus one is at most n divided by two, to the k plus one.
So total number of nodes was ranked equal to k plus two is at most n divided by two, to k plus two.
And so on, so the total number of nodes whose rank lies in this interval is at most n divided by two to
the k.
Play video starting at 14 minutes 9 seconds and follow transcript14:09
Okay. The next stop equation is that each time when we call Find of i, it is adopted by a new parent
and since it is new. So, at this point we know that if we have a node i and its parent j is not the root.
Yes, this is essential. Which means that when we go up we find another root when we cofind a Find of
i. And at this point we will reattach node i to this new root. And this new root has strictly larger rank.
Play video starting at 14 minutes 42 seconds and follow transcript14:42
And this in turn means that after most 2 to the k calls to find of i. Find(i) will be adopted by a new
parent whose rank, for sure, does not lie in this interval. Just because the rank of this interval is at
most 2 to the k. So if we increase the rank of the parent of i, at least 2 to the k times, it will be greater
than 2 to the k for sure.
QUIZ • 30 MIN
Slides
Download the slides for this lesson:
06_2_disjoint_sets_2_efficient.pdfPDF File
References
See section 5.1.4 of Sanjoy Dasgupta, Christos Papadimitriou, and Umesh Vazirani. Algorithms
(1st Edition). McGraw-Hill Higher Education. 2008.
Also see this visualization of Disjoint Sets with and without Path Compression and Union by Rank
heuristics.
Week 4
Discuss and ask questions about Week 4.
Go to forum
Hash Tables
In this module you will learn about very powerful and widely used technique called hashing. Its
applications include implementation of programming languages, file systems, pattern search,
distributed key-value storage and many more. You will learn how to implement data structures to
store and modify sets of objects and mappings from one type of objects to another one. You will see
that naive implementations either consume huge amount of memory or are slow, and then you will
learn to implement hash tables that use linear memory and work in O(1) on average! In the end, you
will learn how hash functions are used in modern disrtibuted systems and how they are used to
optimize storage of services like Dropbox, Google Drive and Yandex Disk!
Less
Key Concepts
Less
Video: LectureApplications of Hashing
2 min
Resume
. Click to resume
7 min
Video: LectureDirect Addressing
7 min
Video: LectureList-based Mapping
8 min
Video: LectureHash Functions
3 min
Video: LectureChaining Scheme
6 min
5 min
Video: LectureHash Tables
6 min
Reading: Slides and External References
10 min
Hash Functions
4 min
6 min
Video: LectureUniversal Family
9 min
Video: LectureHashing Integers
9 min
8 min
11 min
Video: LectureHashing Strings
9 min
7 min
10 min
4 questions
7 min
Video: LectureRabin-Karp's Algorithm
9 min
Video: LectureOptimization: Precomputation
9 min
5 min
10 min
Distributed Hash Tables (Optional)
10 min
12 min
10 min
Programming Assignment 3
Practice Quiz: Hashing
3 questions
2h
Applications of Hashing
Hi.
In this module, we'll study hashing, and hash tables.
Hashing is a powerful technique with a wide range of applications.
In this video, we will learn about some examples of those applications,
just to have a taste of it.
The first example that comes to mind is, of course, programming languages.
In most of the programming languages, there are built-in data types or
data structures in the standard library that are based on hash tables.
For example, dict or dictionary in Python, or HashMap in Java.
Another case is keywords of the language itself.
When you need to highlight them in the text editor or when the compiler needs to
separate keywords from other identifiers in the problem to compile it.
It needs to store all the keywords in the set.
And that set is usually intuitive using the hashtag.
Play video starting at 48 seconds and follow transcript0:48
Another example is file system.
When you interact with a file system as a user, you see the file name,
maybe the path to the file.
But to actually store the correspondence between the file name and path, and
the physical location of that file on the disk.
System uses a map, and that map is usually implemented as a hash table.
Play video starting at 1 minute 8 seconds and follow transcript1:08
Another example is password verification.
When you use some web service and you log into that and you type your password,
actually if it is a good service, it won't send your password in clear text through
the network to the server to check if that's the correct password or not,
because that message could be intercepted and then someone will know your password.
Instead, a hash value of your password is computed.
On your client side and then sent to the server and
the server compares that hash value with the hash value of the stored password.
And if those coincide, you get authenticated.
Play video starting at 1 minute 46 seconds and follow transcript1:46
Special cryptographic hash functions are used for that.
It means that it is very hard to try and
find another string which has the same hash value as your password.
So you are secure.
Nobody can actually construct a different string which has the same hash value as
your password and then log in as you in the system, even if he intercepted
the message with the hash value of your password going to the server.
Play video starting at 2 minutes 13 seconds and follow transcript2:13
Another example, storage optimization for online cloud storages,
such as Dropbox, Google Drive or Yandex.Disk.
Those use a huge amount of space to store all the user files and
that can actually be optimized using hashing.
We will discuss this example further in the lectures of this module.
Hi, in this video, we will introduce a problem about a web service, and IP addresses of it's clients. We
will use this problem, to illustrate different approaches throughout the whole lesson. Suppose you
have a web service with many, many clients, who access your service through the Internet from
different computers. In the Internet, there is a system which assigns a unique address to each
computer in the network. Just like every house in the city has its own address. Those addresses of
computers are called IP addresses or just IPs. Every IP address looks like this, four integers, separated
by dots. Every of the four integers is from 0 to 255. So that it can be stored in eight bits of memory.
And the whole IP address, can be stored in 32 bits of memory as the standard integer type in C++ or
Java. So there are 2 to the power of 32 different IP addresses, which is roughly 4 billion.
Play video starting at 56 seconds and follow transcript0:56
Recently, the Internet became so big that 4 billion is no longer enough for all of the commuters in the
network. That's why people designed the new address system called IPv6. And the number of
addresses there is 2 to the power of 128, which is a number with 39 digits. And it will be sufficient for
a long time.
Play video starting at 1 minute 16 seconds and follow transcript1:16
In this problem, we will start talking about old system called IPv4, which is still in use. And which
contains only 2 to the power of 32 different IP addresses.
Play video starting at 1 minute 27 seconds and follow transcript1:27
When somebody accesses your web service, you know from which IP address did he or she access it.
And you store this information in a special file called access log. You want to analyze all the activity,
for example, to defend yourself from attacks. An adversary can try to kill your service by sending lots
and lots of requests from his computer to your service, so that it doesn't survive the lot and fails. This
is called Denial of Service attack. And you want to be able to quickly notice the pattern. That there is a
unusual high number of requests from the same IP address during some period of time for example,
the last hour. And to do that, you want to analyze your Access Log.
Play video starting at 2 minutes 11 seconds and follow transcript2:11
You can think of your access log as of a simple text file with many, many lines. And in each line, you
have date and time of the access, and the IP address from which the client accessed your servers. And
you want to be able to quickly answer the queries like, did anybody access my service from this
particular IP address during the last hour? And how many times did he access my service? And how
many different IPs were used to access the service during the last hour?
Play video starting at 2 minutes 39 seconds and follow transcript2:39
To answer those questions, we'll need to do some Log Processing. But of course, we don't want to
process whole one hour of logs each time we want to answer such a simple question because one
hour of logs can easily contain dozens of thousands or hundred of thousands or even millions of lines
depending on the load of your web service. Want to do that much faster.
Play video starting at 3 minutes 2 seconds and follow transcript3:02
So to do that we'll keep count. For each IP address, we'll keep a counter that says how many times
exactly that IP address appears in the last one hour of the access log, or how many times during the
last hour clients accessed your service from that particular IP address.
Play video starting at 3 minutes 23 seconds and follow transcript3:23
And we'll store it in some data structure C, which is basically some data structure to store the
mapping from IP addresses to counters. We don't know yet how to implement that data structure C.
We will discuss that further. We will update the counter corresponding to IP addresses every second.
For example, if now is 1 hour 45 minutes and 13 seconds from the start of the date and we'll ignore
the date field in the access log for the sake of simplicity. Then we need to increment the counters
corresponding to the IP addresses in the last two lines of the log, because those are new lines. We
also need to remember to decrement the counters corresponding to the IP addresses in the old lines
of the log. For that we'll look at the lines exactly 1 hour ago in the log. Because the lines which are
older than that, for them we've already decremented the counters in the previous seconds. And the
lines which are more recent than that, we still don't need to decrement the counters because the IPs
in those lines are still in the 1 hour window ending in the current second. So we'll decrement the
counters corresponding to the lines which are 1 hour ago from the current moment.
Play video starting at 4 minutes 34 seconds and follow transcript4:34
Now let's look at the to pseudo code. In the main loop we have the following variables. log represents
the access log. We will think of it as an array of log lines. Each log line has two fields. Time and IP
address. C is some mapping from IPs to counters. We still don't know how to implement that but we
suppose that we have some data structure for that.
Play video starting at 4 minutes 59 seconds and follow transcript4:59
i is an index in the log which points to the first unprocessed log line. So when a new second starts,
we'll need to start incrementing counters corresponding to lines starting from i and further in log.
Play video starting at 5 minutes 13 seconds and follow transcript5:13
j is the first or the oldest line in the current 1 hour window. So that when the next second starts we'll
need to decrement counters for some of the lines starting from line number j. We initialize i and j
with 0 and C with an empty mapping, because there is nothing to store in the start. And then each
second, we call procedure UpdateAccessList, and we pass there the access log to read data from. We
also pass i and j, which we will use inside and also update. And we pass data structure C, which is our
goal to updated.
Play video starting at 5 minutes 49 seconds and follow transcript5:49
So now let's look at the pseudo code for update access list. it consists of two parts. The first part deals
with the new lines and the second part deals with the old lines.
Play video starting at 5 minutes 59 seconds and follow transcript5:59
New lines start from line number i which is the first unprocessed line. Look at this line and we
increase the counter corresponding to the IP in this line using our data structure C. And then we go on
to the next line. We'll proceed with this while the time written in the log line i is still less than or equal
to the time when UpdateAccessList was launched and then we stop processing new lines. And we
want to all blinds. How do we determine that the line is old enough, to decrement the counter?
Play video starting at 6 minutes 30 seconds and follow transcript6:30
We compute the time now, which we assume is computed in seconds. So then we need to subtract,
exactly one hour from that and that is 3600 seconds. And if the time written in line j is less than or
equal to that, we need to decrement the corresponding counter. So we'll start with line number j,
which is the first line in our 1 hour window. We check that it is old enough to decrement the calendar.
We decrement the calendar if that's the case and then we move on to the next line. In the and when
we stop in this while loop, j will point again to the first or oldest line in the current 1 hour window.
Play video starting at 7 minutes 11 seconds and follow transcript7:11
So we've implemented the updating procedure correctly. Now how to answer the question whether
this particular IP was or was not used to access our service during the last hour. That is really easy. If
the counter corresponding to that IP is more than 0, then this IP was used during the last hour.
Otherwise the counter will be 0.
So,we've implemented all the procedures necessary to answer the questions, but for one small detail.
We don't know how to implement data structure C.
Play video starting at 7 minutes 47 seconds and follow transcript7:47
And we will discuss that in the next lectures.
Direct Addressing
Hi. In this video we will talk about direct addressing, which is the first step on the way to hashing.
Remember this computer code from the last video. We implemented procedure UpdateAccessList
using a data structure C, which stores a counter for any IP address.
Play video starting at 18 seconds and follow transcript0:18
Now the question is, how to implement the data structure C itself.
Play video starting at 23 seconds and follow transcript0:23
The idea here, is that there are 2 to the power of 32 different IP addresses. According to IP(v4)
format.
Play video starting at 31 seconds and follow transcript0:31
And we can actually convert each IP to a 32-bit integer. And it will be a one to one correspondence
between old possible IPs. And all numbers between zero and two to the power 32 minus one.
Play video starting at 46 seconds and follow transcript0:46
Thus, we can create an array A, of size exactly two to the power of 32, with indexes zero to two to the
power of 32 minus one. And then for each IP, there will be exactly one position in this array,
correspondent to this IP. And then, we will be able to use the corresponding answer in array A.
Instead of the counter for this IP.
Play video starting at 1 minute 13 seconds and follow transcript1:13
Now, how do we actually convert IP addresses to integers?
If you look at this picture, you will see that any IP address actually consists of 4 integer numbers.
Which are all, at most, 255. And each of them corresponds to 8 bits, or 1 byte in the total 4-byte or
32-bit integer number. Basically, if you just coordinate all the 8 bytes corresponding to first number
with 8 bytes. Corresponding to the second number and to the third number and to the fourth
number. You will get 32 bytes. And if you then convert this string of 32 bytes into the decimal form.
You will get an integer number in the form which we are used to. For example, if you take a very
simple IP address, 0.0.0.1. It will convert to integer 1, because all the higher bits are zeroes and in the
lowest byte. The only bit set is the lowest bit and that corresponds to number 1. If we convert the
number in the picture, to the decimal form, we will get 2886794753. Now, what do you think will be
the integer number corresponding to this IP? And the correct answer is 1168893508.
Play video starting at 2 minutes 41 seconds and follow transcript2:41
Now, here is the formula and the code to convert an IP address to an integer number. Why is that?
Well, the lowest eight bits are in the fourth number of the IP address. So we use them without
changing.
Play video starting at 2 minutes 58 seconds and follow transcript2:58
The next Eight bits, are in the third number of IP. But to use them, we need to move them to the left
by eight positions in the binary form. And to do that, we need to multiply the corresponding integer
number by two to the power of eight.
Play video starting at 3 minutes 15 seconds and follow transcript3:15
The next eight bits are in the second number of the IP.
Play video starting at 3 minutes 19 seconds and follow transcript3:19
And to use them we need to move them to the left by 16 positions in the binary form. To do that, we
multiply the corresponding integer number by two to the power of 16, and so on. This gives us a one
to one correspondence between IP address and integer number.
Play video starting at 3 minutes 36 seconds and follow transcript3:36
Now, we can rewrite the code for UpdateAccessList using array A, instead of mysterious data
structure C.
Play video starting at 3 minutes 44 seconds and follow transcript3:44
And the only thing that changes is the incrementing and decrementing the counters. So when we
need to increment a counter corresponding to the IP in the ith line. We first convert this IP to integer
number from 0 to 2 to the power of 32 minus 1. And then we increase the entry of the integer RA A,
add this index. Note, that each IP is converted to its own integer number. So, there will be no
collisions between different IP numbers. When we try to increment a counter for one IP number and
by chance increment the current correspondent to another IP address. All IP addresses are uniquely
mapped into integers from zero to two to the power of 32 minus one. We do the same thing when we
need to decrement the counter. So basically, in the position in array A corresponding to any IP
address, we will store the counter. Which measures how many times this particular IP was accessed
during the last hour.
Now, how to answer the question, whether this IP was or was not used during the last hour, to access
your services. This is very easy. We first convert the IP to the corresponding position in the area A,
and then we look at the counter this position.
Play video starting at 5 minutes 7 seconds and follow transcript5:07
If the IP was used, then the counter will be more than zero. Otherwise it will be exactly zero.
Play video starting at 5 minutes 14 seconds and follow transcript5:14
So, now lets look at the asyptotics of this implementation. UpdateAccessList is as fast as we can do. It
is constant time per log line. Because for each log line, we only look at some position in the array and
increment it. And also increment some counter, or decrement some counter.
Play video starting at 5 minutes 34 seconds and follow transcript5:34
AccessedLastHour is also constant time. Because the only thing we do is, we look at some position in
their rate. Which is a constant time impression and compare it with a zero, but there is a drawback.
Play video starting at 5 minutes 46 seconds and follow transcript5:46
Even if during the last hour, for example, in the night, there are only five, or 10, or 100 IPs. From
which your clients use the service. You will still need 2 to the power of 32 memory cells, to store that
information.
Play video starting at 6 minutes 2 seconds and follow transcript6:02
And in general, if you have for example, new IP protocol. IPv6, it already contains 2 to the power of
128 different IP addresses. And if you create an array of that size, it won't fit in memory in your
computer.
Play video starting at 6 minutes 19 seconds and follow transcript6:19
In the general case, we need O(N) memory, where N is the size of our universe. Universe is the set of
all objects, that we might possibly want to store in our data structure. It doesn't mean that every one
of them will be stored in our data structure. But if we at least at some point might want to store it, we
have to count it. So for example, if some of the IP addresses never access your service. You will still
have to have a cell in your array for this particular IP, in the direct addressing method. So, this method
only works when the universe is somewhat small. And we need to invent something else to work with
the universes which are bigger than that or even infinite. Such as, for example, the universe of all
possible words, all possible strings, or all possible files on your computer. And we will talk in the next
videos about that.
List-based Mapping
Play Video
Play
Volume
0:00/8:10
Settings
Full Screen
Notes
All notes
Click the “Save Note” button when you want to capture a screen. You can also highlight and save
lines from the transcript below. Add your own notes to anything you’ve captured.
Save Note
Discuss
Download
Help Us Translate
Interactive Transcript - Enable basic transcript mode by pressing the escape key
You may navigate through the transcript using tab. To save a note for a section of text press CTRL
+ S. To expand your selection you may use CTRL + arrow key. You may contract your selection
using shift + CTRL + arrow key. For screen readers that are incompatible with using arrow keys for
shortcuts, you can replace them with the H J K L keys. Some screen readers may require using
CTRL in conjunction with the alt key
Play video starting at 0 seconds and follow transcript0:00
Hi, in this video we will study another approach to the IP addresses problem.
In the last video we understood that the direct addressing scheme sometimes
requires too much memory.
And why is that?
Because it tries to store something for
each possible IP address while we're only interested in the active IP addresses.
Those from which at least some user has accessed our service during the last hour.
So the first idea for improvement of the memory consumption
is let's just store the active IP's and nothing else.
Another idea is that if our error based approach from the last video has failed,
then lets try to use list instead of an error.
So let's store all the IP addresses which are active in a list.
Sorted by the time of access.
So that the first element in the list corresponds to the oldest access time
during the last hour, and the last element in the list corresponds to the latest,
newest access from some IP address to our service.
Let's jump from here right into the pseudo code, because it's pretty simple.
We're going to have our procedure update access list which takes in
the log file log.
It also takes in i which is the index of the first
log line which hasn't been processed yet.
And also it has input L which is the list and
instead of some abstract data structure see from the first videos and
instead of the area a from the direct addressing scheme.
We put parameter L which is a list into this procedure and
this is the list with active IP addresses.
So our code have to pass first deals with new lines and second deals with old lines.
We just go searching from the first unprocessed line.
And if we need to added to our list because it was processed
during the last hour, we just append it to the end of the list.
And now again, the last element of the list corresponds to the latest,
newest access from some IP address.
And note that in our list we will start not just the IP address but,
both IP address and the time of the axis.
Play video starting at 2 minutes 7 seconds and follow transcript2:07
And then we will go to the next element in the log file and go and go while we
still have some log lines which we need to add to the end of our list.
And then the second part we just look at the oldest event during the last hour,
which is corresponding to the first element of the list.
And if that is actually before the start of the last hour,
then we need to remove it from the list.
And so we just do L.Pop.
And we do that while the head of the list is still too old.
Play video starting at 2 minutes 39 seconds and follow transcript2:39
And when we stop, it means that all the elements in the list are actually
with time during the last hour.
Why is that?
Because the list is always kept in the order by increasing time of access.
When we add new log lines to the list.
We add only those which have time even more than last element of the list
currently, and we remove something from the list.
We remove the oldest entries.
So, all the entries are always sorted, and as soon as we removed everything from
the start which is too old, all the entries in the list are not too old.
They are made during the last hour.
Play video starting at 3 minutes 19 seconds and follow transcript3:19
So this is pretty simple and now we need to answer questions like, whether my IP
address was used during the last hour to access the service and how many times.
To answer the first one we just need to
Play video starting at 3 minutes 31 seconds and follow transcript3:31
find out whether there is an element in our list with the given IP address.
And that is done by find by ID, which is different from the standards
find procedure of the least by the fact that we search not by the whole object,
which is a log line, which contains both IP address and time.
But we search just by the first field, by the IP address.
So our list contains tuples of IP addresses and
times of access, and we only look by IP address.
But the implementation will be the same.
We'll just go from the head of the list to the end of the list, and
compare the IP field of the log lines with the IP address given as the input.
And if it coincides we will return this element,
otherwise we'll return that there is nothing with this IP address in the list.
And the reason we return some special [INAUDIBLE].
So then, in the AccessedLastHour, just compare the results with null.
If it's not null then this IP address is in the list, otherwise it's not.
Play video starting at 4 minutes 36 seconds and follow transcript4:36
And to count the number of times
Play video starting at 4 minutes 40 seconds and follow transcript4:40
our service was accessed from a particular IP address, we just need to
count the number of log lines in the list which have the same IP address.
And that can be done by procedure CountIP of the list which
again differs from the standard count procedure in the list by the fact that it
counts by the first field, not by the whole object which is a log line.
But it just goes from to the end of the list.
Compares the IP field with the given IP and if they coincide,
it increases the counter by 1.
And returns the counter in the end.
So this is all the implementation.
Now let's analyze it.
Let N be the number of currently active IPs,
then the memory consumption is bigger of N.
Because we only store the active IP addresses and the corresponding times
of X's, but the times of X's on the add constant memory per active IPs.
So it's all null linear in the number of active IPs which is much better than
the direct addressing scheme because it require an amount of memory
proportionally to the number of all possible IP addresses.
And here will only require amount memory proportional to the number of
currently active IP addresses.
What about running time?
We know the standards list procedures such as Append, Top and
Pop all working constant time and
that's why the UpdateAccessList works in constant time per log line.
Of course, any particular call to UpdateAccessList could take more than
constant number of operations if we need to add more new lines to the end of
the list or remove many many old lines from the start of the list.
But for each log line we will only append it at most once and
we will only removed from the beginning at most once.
So it's constant time per log line plus constant time per each call of
UpdateAccessList just to check whether we need to append something and
whether we need to remove something from the beginning.
But this amount of operations can be controlled by how
often do we actually call Update Access List.
Play video starting at 6 minutes 44 seconds and follow transcript6:44
What about answering the questions?
We know that Find By IP and
Count IP have to go through the whole list in the worst case and actually count.
IP has to go through the whole list all the time to find out how many
log lines have the same IP as the given one and so AccessLastHour and
AccessCountLastHour are both linear in the number of active IPs.
And that is actually now good because even without introducing
any additional data structures, we could just take the log file,
take the last line in it before the current time, and go back from it.
And just look through each log line and
compare its IP address with the IP address in the question.
And count how many times it occurs during the last hour and
just stop as soon as we go through the border of the last hour.
And that will take the same time without any additional data structure.
So this solution is not more clever than the trio approach.
Play video starting at 7 minutes 43 seconds and follow transcript7:43
So, we failed somewhat with direct addressing scheme and
we failed with this list based approach.
It is overall a failure?
Well no, in the next videos we'll combine the ideas from direct
addressing scheme with the list based approach.
And we'll come up with solution which is both good in terms of memory consumption
and is much faster than the trivial approach in terms of the running time.
List-based Mapping
Hi, in this video we will study another approach to the IP addresses problem. In the last video we
understood that the direct addressing scheme sometimes requires too much memory. And why is
that? Because it tries to store something for each possible IP address while we're only interested in
the active IP addresses. Those from which at least some user has accessed our service during the last
hour. So the first idea for improvement of the memory consumption is let's just store the active IP's
and nothing else. Another idea is that if our error based approach from the last video has failed, then
lets try to use list instead of an error. So let's store all the IP addresses which are active in a list.
Sorted by the time of access. So that the first element in the list corresponds to the oldest access time
during the last hour, and the last element in the list corresponds to the latest, newest access from
some IP address to our service. Let's jump from here right into the pseudo code, because it's pretty
simple. We're going to have our procedure update access list which takes in the log file log. It also
takes in i which is the index of the first log line which hasn't been processed yet. And also it has input
L which is the list and instead of some abstract data structure see from the first videos and instead of
the area a from the direct addressing scheme. We put parameter L which is a list into this procedure
and this is the list with active IP addresses. So our code have to pass first deals with new lines and
second deals with old lines. We just go searching from the first unprocessed line. And if we need to
added to our list because it was processed during the last hour, we just append it to the end of the
list. And now again, the last element of the list corresponds to the latest, newest access from some IP
address. And note that in our list we will start not just the IP address but, both IP address and the
time of the axis.
Play video starting at 2 minutes 7 seconds and follow transcript2:07
And then we will go to the next element in the log file and go and go while we still have some log lines
which we need to add to the end of our list. And then the second part we just look at the oldest event
during the last hour, which is corresponding to the first element of the list. And if that is actually
before the start of the last hour, then we need to remove it from the list. And so we just do L.Pop.
And we do that while the head of the list is still too old.
Play video starting at 2 minutes 39 seconds and follow transcript2:39
And when we stop, it means that all the elements in the list are actually with time during the last
hour. Why is that? Because the list is always kept in the order by increasing time of access. When we
add new log lines to the list. We add only those which have time even more than last element of the
list currently, and we remove something from the list. We remove the oldest entries. So, all the
entries are always sorted, and as soon as we removed everything from the start which is too old, all
the entries in the list are not too old. They are made during the last hour.
Play video starting at 3 minutes 19 seconds and follow transcript3:19
So this is pretty simple and now we need to answer questions like, whether my IP address was used
during the last hour to access the service and how many times. To answer the first one we just need
to
Play video starting at 3 minutes 31 seconds and follow transcript3:31
find out whether there is an element in our list with the given IP address. And that is done by find by
ID, which is different from the standards find procedure of the least by the fact that we search not by
the whole object, which is a log line, which contains both IP address and time. But we search just by
the first field, by the IP address. So our list contains tuples of IP addresses and times of access, and we
only look by IP address. But the implementation will be the same. We'll just go from the head of the
list to the end of the list, and compare the IP field of the log lines with the IP address given as the
input. And if it coincides we will return this element, otherwise we'll return that there is nothing with
this IP address in the list. And the reason we return some special [INAUDIBLE]. So then, in the
AccessedLastHour, just compare the results with null. If it's not null then this IP address is in the list,
otherwise it's not.
Play video starting at 4 minutes 36 seconds and follow transcript4:36
And to count the number of times
Play video starting at 4 minutes 40 seconds and follow transcript4:40
our service was accessed from a particular IP address, we just need to count the number of log lines in
the list which have the same IP address. And that can be done by procedure CountIP of the list which
again differs from the standard count procedure in the list by the fact that it counts by the first field,
not by the whole object which is a log line. But it just goes from to the end of the list. Compares the IP
field with the given IP and if they coincide, it increases the counter by 1. And returns the counter in
the end. So this is all the implementation.
Now let's analyze it. Let N be the number of currently active IPs, then the memory consumption is
bigger of N. Because we only store the active IP addresses and the corresponding times of X's, but the
times of X's on the add constant memory per active IPs. So it's all null linear in the number of active
IPs which is much better than the direct addressing scheme because it require an amount of memory
proportionally to the number of all possible IP addresses. And here will only require amount memory
proportional to the number of currently active IP addresses. What about running time? We know the
standards list procedures such as Append, Top and Pop all working constant time and that's why the
UpdateAccessList works in constant time per log line. Of course, any particular call to
UpdateAccessList could take more than constant number of operations if we need to add more new
lines to the end of the list or remove many many old lines from the start of the list. But for each log
line we will only append it at most once and we will only removed from the beginning at most once.
So it's constant time per log line plus constant time per each call of UpdateAccessList just to check
whether we need to append something and whether we need to remove something from the
beginning. But this amount of operations can be controlled by how often do we actually call Update
Access List.
Play video starting at 6 minutes 44 seconds and follow transcript6:44
What about answering the questions? We know that Find By IP and Count IP have to go through the
whole list in the worst case and actually count. IP has to go through the whole list all the time to find
out how many log lines have the same IP as the given one and so AccessLastHour and
AccessCountLastHour are both linear in the number of active IPs. And that is actually now good
because even without introducing any additional data structures, we could just take the log file, take
the last line in it before the current time, and go back from it. And just look through each log line and
compare its IP address with the IP address in the question. And count how many times it occurs
during the last hour and just stop as soon as we go through the border of the last hour. And that will
take the same time without any additional data structure. So this solution is not more clever than the
trio approach.
Play video starting at 7 minutes 43 seconds and follow transcript7:43
So, we failed somewhat with direct addressing scheme and we failed with this list based approach. It
is overall a failure? Well no, in the next videos we'll combine the ideas from direct addressing scheme
with the list based approach. And we'll come up with solution which is both good in terms of memory
consumption and is much faster than the trivial approach in terms of the running time.
Hash Functions
Hi, in this video, you will learn what a hash function is, how could we apply it to solve our problem
with IP addresses, and why it is not straightforward to make it to work. Remember the direct
addressing approach worked particularly fast, but it used a lot of memory that's because it encoded IP
with numbers and those numbers were sometimes huge. So we had to create an array of size 2 to the
power of 32 just to store all those numbers. What if we could encode our IP addresses with smaller
numbers, for example, numbers from 0 to 999? We'll still need the code for different IP addresses,
which are active currently to be different because we want a separate counter for each IP in our
solution.
Play video starting at 47 seconds and follow transcript0:47
Let's define a hash function. So if you have universal object S for example. A set of all IP addresses or
a set of all files stored on your computer or a set of all words or cures in the programming language,
so that is our universe. And we will call it a set S. And now we want to encode each object from that
universe with a small number. A number from 0 to m- 1 where m is a positive integer number. While
any function, which encodes some object from S as a number from 0 to m- 1, is called a hash function.
Play video starting at 1 minute 26 seconds and follow transcript1:26
And m is called the cardinality of hash function h.
Play video starting at 1 minute 31 seconds and follow transcript1:31
So what are the desirable properties of the hash function in our problem? First, h should be fast to
compute because we need to encode some object for each query. Second, we want different values
for different objects because we want a separate counter for each IP address in our problem from
them.
Play video starting at 1 minute 51 seconds and follow transcript1:51
And also, we want to use direct addressing scheme because it was very fast, but we want to use a
direct addressing scheme with a small amount of memory. And it's only logical to use in this case
direct addressing scheme with O(m) memories. Just create an area a of size m, and then encode each
ID with some value from 0 to m- 1, and store the corresponding counter In the cell of this array.
Play video starting at 2 minutes 18 seconds and follow transcript2:18
The problem is that we want small cardinality m and it won't work if m is smaller than the number of
different objects in the universe. Because if we have for example 25 object in the universe and m is
only 10, then at least two objects will have the same code from 0 to 9 because there are only 10
different codes and there are 25 different objects. So that won't work for all possible universes and
for small m.
In this video, we will study chaining, which is one of the most frequently used techniques for using
hashing to store mappings from one type of object to another type of object.
Play video starting at 12 seconds and follow transcript0:12
So, let us define a map.
Play video starting at 15 seconds and follow transcript0:15
We often want to store mapping from some objects to some other object. For example, I'm mapping
from IP addresses to integer numbers. Or from filenames to the physical location of those files on the
disk. From student ID to the name of the student. Or from contact name in your phone book to the
contact phone number.
Play video starting at 40 seconds and follow transcript0:40
The general definition of a map from set of objects S to the set of values V is a data structure which
has three methods. HasKey, which tells us whether there is an entry in the map corresponding to
object O from set S. Method Get, which returns to us the value corresponding to the object O, if there
is one. If there is no such value, it returns a special value telling us that there is no entry
corresponding to this object O in the map.
Play video starting at 1 minute 13 seconds and follow transcript1:13
And the last method is set, the most important method, which sets the value corresponding to object
O to V.
Play video starting at 1 minute 22 seconds and follow transcript1:22
Here, objects O are all from the set S and values V are from the set big V.
Play video starting at 1 minute 30 seconds and follow transcript1:30
We want to implement a map, using hash function, and some combination of ideas from direct
addressing, and least based solution from one of the previous videos. So what we'll do is called
chaining.
We will create an array of size m, where m is the cardinality of the hash function, and in this case, let
m be eight. This won't be an array of integers, though. This will be an array of lists. So in each cell of
this array, we will store a list. And this will be a list of pairs. And each pair will consist of an object, O.
And a value V, corresponding to this object. Let's look at an example.
Play video starting at 2 minutes 14 seconds and follow transcript2:14
For example, our objects are IP addresses, and the values are the corresponding counters. As in our
initial problem about web service, and IP addresses of its class. Now we're processing the log, and we
see an IP address, starting with 173. And it so happens that the value of hash function on this IP
address is four. Then, we look at the cell four, the list there is now empty. But we append, in the pair
of our IP address. And the corresponding counter one, to this list. The value is one because this is the
first time that we encounter this AB.
Play video starting at 2 minutes 54 seconds and follow transcript2:54
Now we'll look at the next IP in the log. It starts with 69, and the hash value for this IP is one. So we'll
look at the cell number one, and we append the pair of this IP address and the corresponding counter
one to the list. Again the counter is one because this is the first time we see this IP address.
Play video starting at 3 minutes 15 seconds and follow transcript3:15
Now it looks at the next IP address in the log and we see that it again starts with 173 and actually it
coincides with the first IP that we've already seen. And the hash value is again four, because hash
function is deterministic, it always returns the same number for same object. So we'll look at the cell
number four, we'll look through the whole list and we find out that there is already a pair containing
this IP address as the key. So instead of appending this IP address again to the list, we will increase
the value of the counter by one because this is the second time we've seen our IP address. Of course
in the interface of a general map, there is no method for incrementing a counter, there is a method to
set so we will need to first use method get to get the value corresponding to this IP address, we will
get one. We will then increase it by one ourselves, get two. And then we will call set for this IP
address and value two. And it will just rewrite the value from one to two in this list element.
Play video starting at 4 minutes 27 seconds and follow transcript4:27
Then, we'll look at the next line in our log, and we see that this is IP starting from 91. And it so
happens that the hash value for this IP address, again, is four, although this is a different IP address.
And that has to happen at some point, because there are many, many different IP addresses, and only
eight entries in our array.
Play video starting at 4 minutes 51 seconds and follow transcript4:51
So what do we do? If we look at the cell number four, there is a non-empty list there. We go through
the whole list, but we see that our new IP address starting from 91 is not in the list. So we add our
new IP address to the end of this list
Play video starting at 5 minutes 9 seconds and follow transcript5:09
along with the corresponding counter of one. And these two IP addresses in the list for cell number
four already make a chain together. And if we go further and further through the log, and we add
some IP addresses to this map, some of the chains will become longer. If where some point we'll need
to remove some IP address from the list we can do that and the chain can become shorter. But
anyway, you see the general structure that a chain maybe empty, maybe non-empty, starts in any cell
of the array. The array size is m, which is equal to the cardinality of the hash function. And for each
such cell we store a list with all the IP addresses which occurred before and which have hash value
the same as the number of the cell.
In this video, we will study chaining, which is one of the most frequently used techniques for using
hashing to store mappings from one type of object to another type of object.
Play video starting at 12 seconds and follow transcript0:12
So, let us define a map.
Play video starting at 15 seconds and follow transcript0:15
We often want to store mapping from some objects to some other object. For example, I'm mapping
from IP addresses to integer numbers. Or from filenames to the physical location of those files on the
disk. From student ID to the name of the student. Or from contact name in your phone book to the
contact phone number.
Play video starting at 40 seconds and follow transcript0:40
The general definition of a map from set of objects S to the set of values V is a data structure which
has three methods. HasKey, which tells us whether there is an entry in the map corresponding to
object O from set S. Method Get, which returns to us the value corresponding to the object O, if there
is one. If there is no such value, it returns a special value telling us that there is no entry
corresponding to this object O in the map.
Play video starting at 1 minute 13 seconds and follow transcript1:13
And the last method is set, the most important method, which sets the value corresponding to object
O to V.
Play video starting at 1 minute 22 seconds and follow transcript1:22
Here, objects O are all from the set S and values V are from the set big V.
Play video starting at 1 minute 30 seconds and follow transcript1:30
We want to implement a map, using hash function, and some combination of ideas from direct
addressing, and least based solution from one of the previous videos. So what we'll do is called
chaining. We will create an array of size m, where m is the cardinality of the hash function, and in this
case, let m be eight. This won't be an array of integers, though. This will be an array of lists. So in each
cell of this array, we will store a list. And this will be a list of pairs. And each pair will consist of an
object, O. And a value V, corresponding to this object. Let's look at an example.
Play video starting at 2 minutes 14 seconds and follow transcript2:14
For example, our objects are IP addresses, and the values are the corresponding counters. As in our
initial problem about web service, and IP addresses of its class. Now we're processing the log, and we
see an IP address, starting with 173. And it so happens that the value of hash function on this IP
address is four. Then, we look at the cell four, the list there is now empty. But we append, in the pair
of our IP address. And the corresponding counter one, to this list. The value is one because this is the
first time that we encounter this AB.
Play video starting at 2 minutes 54 seconds and follow transcript2:54
Now we'll look at the next IP in the log. It starts with 69, and the hash value for this IP is one. So we'll
look at the cell number one, and we append the pair of this IP address and the corresponding counter
one to the list. Again the counter is one because this is the first time we see this IP address.
Play video starting at 3 minutes 15 seconds and follow transcript3:15
Now it looks at the next IP address in the log and we see that it again starts with 173 and actually it
coincides with the first IP that we've already seen. And the hash value is again four, because hash
function is deterministic, it always returns the same number for same object. So we'll look at the cell
number four, we'll look through the whole list and we find out that there is already a pair containing
this IP address as the key. So instead of appending this IP address again to the list, we will increase
the value of the counter by one because this is the second time we've seen our IP address. Of course
in the interface of a general map, there is no method for incrementing a counter, there is a method to
set so we will need to first use method get to get the value corresponding to this IP address, we will
get one. We will then increase it by one ourselves, get two. And then we will call set for this IP
address and value two. And it will just rewrite the value from one to two in this list element.
Play video starting at 4 minutes 27 seconds and follow transcript4:27
Then, we'll look at the next line in our log, and we see that this is IP starting from 91. And it so
happens that the hash value for this IP address, again, is four, although this is a different IP address.
And that has to happen at some point, because there are many, many different IP addresses, and only
eight entries in our array.
Play video starting at 4 minutes 51 seconds and follow transcript4:51
So what do we do? If we look at the cell number four, there is a non-empty list there. We go through
the whole list, but we see that our new IP address starting from 91 is not in the list. So we add our
new IP address to the end of this list
Play video starting at 5 minutes 9 seconds and follow transcript5:09
along with the corresponding counter of one. And these two IP addresses in the list for cell number
four already make a chain together. And if we go further and further through the log, and we add
some IP addresses to this map, some of the chains will become longer. If where some point we'll need
to remove some IP address from the list we can do that and the chain can become shorter. But
anyway, you see the general structure that a chain maybe empty, maybe non-empty, starts in any cell
of the array. The array size is m, which is equal to the cardinality of the hash function. And for each
such cell we store a list with all the IP addresses which occurred before and which have hash value
the same as the number of the cell.
Hash Tables
Hi, in this video, we will finally start talking about hash tables. We will define what a hash table is and
what we can do with it. In the last video, we've introduced the notion of map, and now we'll
introduce a very similar and natural notion of a set. By definition, a set is a data structure which has at
least three methods, to add an object to the set, to remove an object from the set, and to find out
whether a given object is already in the set or not.
Play video starting at 29 seconds and follow transcript0:29
One of the examples we already know very well, set of all IPs through which clients access to your
service during the last hour. This is an example with which we've worked for the last few videos.
Another example would be to store the set of all students currently on campus. And another one is to
store all the key words of a given programming language so that we can quickly highlight them in the
text editor, which you used to code. There are two ways to implement a set. One of them is when you
already have an implementation of a map, you can base your implementation of set on the map.
Basically, you can set a map from all the objects S that you need to store in the set to the set of
values, V, which only contains two values, true and false.
Play video starting at 1 minute 22 seconds and follow transcript1:22
If the object is in the set, then the corresponding value to this object will be true. If the object is not in
the set, it is either not in the map or the corresponding value to it in the map is false. But that is not a
very efficient way because we will have to store twice as much objects and values as we need. And
also, when we remove objects from the set, it will be hard to remove them from the map. We will
probably have to store them with value false, so there's a better way. We can again use chaining. But
instead of storing pairs of objects and corresponding values in the chains, we'll just store objects
themselves.
Let's see how can we implement that into the code. Again, we'll have a hash function from all the
objects S to the set of integer numbers from 0 to m-1. We denote it by O and O' objects from the set
S, and we initialize array A with an array of size m which consists of lists or chains. And each chain
consists of object O. Initially all the chains are empty.
Play video starting at 2 minutes 32 seconds and follow transcript2:32
When we need to find an object inside a set, we first compute the hash value of our object, we look at
the corresponding cell in the array A. We take the list of objects from there, and then we go through
the whole list and try to find object O there. If we find it, return true. Otherwise, return false because
our object O can be only in the list corresponding to the cell in the array A, number h(O).
Play video starting at 3 minutes 0 seconds and follow transcript3:00
To implement add, we again compute value of hash function on object O, we take the list
corresponding to this cell. And we go through this list, if we find our object O on this list, then we
don't need to do anything because our object O is already in the set. Otherwise, we append our
object to the list corresponding to cell number h(O).
Play video starting at 3 minutes 24 seconds and follow transcript3:24
To remove object from the set, we first try to find it in the set. If it's not in the set, initially we don't
need to do anything. Otherwise, we again compute the hash value of our object, take the
corresponding list, and erase our object from that list.
Play video starting at 3 minutes 42 seconds and follow transcript3:42
So, now we are ready to say what is a hash table? A hash table is any implementation of a set or a
map which is using hashing, hash functions. It can even not use chaining. There are different ways to
use hash functions to store a set or a map in memory. But chaining is one of the most frequently used
methods to implement a hash table.
Play video starting at 4 minutes 7 seconds and follow transcript4:07
We have a few examples of hash tables already implemented and built in our standard library types
and programming languages, for example. Set is implemented as unordered_set in C++, as HashSet in
Java, as set in Python. And map is implemented as unordered_map in C++, as HashMap in Java, and as
dict, or dictionary in Python.
Play video starting at 4 minutes 29 seconds and follow transcript4:29
Why those types are called unordered in C++? You will learn in one of the next modules about data
structures. For now, you just know that hash tables were already implemented in the main languages
we used for the specialization.
Play video starting at 4 minutes 47 seconds and follow transcript4:47
In conclusion, we've learned what is chaining. We've learned what is a hash table. And now we know
that chaining is a technique that can be used to implement a hash table. We know that the memory
consumptions for the chaining technique is big O(n + m) where n is the number of objects currently
stored in the hash table. And m is the cardinality of the hash function.
Play video starting at 5 minutes 9 seconds and follow transcript5:09
We also know that the operations with such a hash table implemented using chaining work in time
c+1, where c is the length of the longest chain.
Play video starting at 5 minutes 19 seconds and follow transcript5:19
Now the question is, how to make both m and c small? Why do we need that? Because we want both
small memory consumption and fast operations. For example, if m is very big, then we can use direct
addressing, or something like that. But for some universes, some sets of objects, we will use too much
memory, or we will have just too much overhead
Play video starting at 5 minutes 45 seconds and follow transcript5:45
on top of our O of n memory which is needed to store n objects, anyway. If n is small, but c is big, well
that's one different match from the list based approach where we used only O of n memory to store
the list, to store only the active IPs. But then we have to spend O of n time to actually look through all
the list every time we want to make a query. So we want both m being relatively small and c. How can
we do that? Well, we can do that based on a clever selection of a hash function, and we will discuss
this topic in the next lessons.
Chaining Implementation and Analysis
Slides
Download the slides for this lesson:
07_hash_tables_1_intro.pdfPDF File
References
See the chapter 1.5.1 in [DPV] Sanjoy Dasgupta, Christos Papadimitriou, and Umesh Vazirani.
Algorithms (1st Edition). McGraw-Hill Higher Education. 2008.
See the chapters 11.1 and 11.2 in [CLRS] Thomas H. Cormen, Charles E. Leiserson, Ronald L.
Rivest, Clifford Stein. Introduction to Algorithms (3rd Edition). MIT Press and McGraw-Hill. 2009.
Hash Functions
Let's try a few hash functions that come to our mind. First, let's select cardinality of 1000. And choose
the first three digits as the hash value for the phone number. For example for this phone number it
will be 800, because the first three digits are 800.
Play video starting at 3 minutes 16 seconds and follow transcript3:16
However there is a problem with this hash function because the area code, which is the first three
digits will be the same for many, many people in your phone book. Probably because they live In the
same city with you, and so they will have the same area code. And the hash values for their phone
numbers will be the same, and they will make up a very long chain.
Another idea is to take the last digits, again the cardinality is 1,000, and we take the last three digits
as the hash value. So for this number, it will be 567. But still, there can be a problem if there are many
phone numbers in your phone book, which, for example, end in three zeros, or in some other
combinations of three digits. So, another approach is to just select a random value as the hash
function, a random number between 0 and 999. And then the distribution of hash values will be very
good, probably the longest chain will be short. However, we cannot use such hash function actually
because when we'll call the hash function again to look up the phone number we stored in the phone
book we won't find it because we are looking in the wrong place. Because the value of the hash
function changed because it's not deterministic. So we learned that the hash function must be
deterministic, that is return the same value if given the same phone number as the input each time.
So good hash functions are deterministic. Fast to compute because we do that every time we need to
store something or modify something or find something in our hash table. And they should distribute
the keys well in different cells and have few collisions.
Play video starting at 5 minutes 0 seconds and follow transcript5:00
Unfortunately, there is no universal hash function. Most specifically, the Lemma says that the number
of all possible keys, the sizes of the universe with keys is large enough. Much larger than the
cardinality of the hash function that we want to use to save memory. Then for any specific
deterministic hash function there is a bad input, which results in many, many collisions. Why is that?
Well, let's look at the universe U and select some cardinality for example, 3.
Play video starting at 5 minutes 31 seconds and follow transcript5:31
Then, our universe will be divided into three groups. All the keys that have hash value 0, all the keys
that have hash value 1, and all the keys that have hash value of 2. Now, let's select the biggest of
those groups. In this case, it's the group with hash value of 1. This group will definitely be of size at
least one-third of the whole universe and it can be even bigger. In this case for example, around 42%.
And then, if we take all these keys or a significant part of these keys as an input, they will have the
same hash value. And so, all of them will make collisions between themselves and they will form a
very long chain in the hash table and everything will work very slowly. Of course, if we change the
hash function for this particular input, it will distribute the keys more uniformly among hash values.
But for this particular hash function, this will be a bad input. And for any specific hash function with
any cardinality, we'll be able to select a bad input this way. And in the next video, you will learn how
to solve this problem.
Universal Family
Hi, in the previous video you learned that for any deterministic hash function, there is a bad input on
which it will have a lot of collisions. And in this video, you will learn to solve that problem. And the
idea starts from, remember when you started QuickSort algorithm? At first, you learned that it can
work as slow as m squared time. But then you learned that adding a random pivot to the partition
procedure helps, because now you know that QuickSort works on average in n log n time. And in
practice, it works usually faster than the other sorting algorithms. So we want to use the same
randomization idea here for hash functions. But we already know that we cannot just use a random
hash function because it must be deterministic. So instead, we will first create a whole set of hash
functions called a family of hash functions. And we'll choose a random function from this family to use
in our algorithm. Not all families of hash functions are good, however, and so we will need a concept
of universal family of hash functions. So let U be the universe, the set of all possible keys that we want
to hash. And then a set of hash functions denoted by calligraphic letter H, set of functions from U to
numbers between 0 and m- 1. So hash functions with the same cardinality. Such set is called a
universal family if for
Play video starting at 1 minute 29 seconds and follow transcript1:29
any two keys in the universe the probability of collision is small. So, what does that mean? Our hash
function is a deterministic function, so for any two keys it either has a collision for those two keys or
not. So, what does it mean that the probability of collision for two different keys is small?
It means that if we look at our family calligraphic H, then at most 1/m part of all hash functions in this
family, at most 1/m of them have a collision for these two different keys. And if we select a random
hash function from the family with probability at least one minus one over m, which is very close to
one, there will be no collision for this hash function and these two keys. And of course it is essential
that the keys are different. Because if keys are equal then any deterministic hash function will have
the same value on these two keys. So, this collision property with small probability is only for two
different keys in the universe, but for any two different keys in the universe this property should be
satisfied. It might seem that it is impossible but later you will learn how to build a universal family of
hash functions and practice.
So how are randomization idea works in practice. One approach would be to just make one hash
function which returns a random value between 0 and m-1, each value with the same probability.
Then the probability of collision for any two keys is exactly 1/m. But that is not a universal family.
Actually we cannot use this family at all because the hash function is not deterministic and we can
only use deterministic hash functions.
Play video starting at 3 minutes 21 seconds and follow transcript3:21
So instead, we need to have some set of hash functions such that all the hash functions in the set are
deterministic. And then, we will select a random function h from this set of hash functions, and we
will use the same fixed function h throughout the whole algorithm. So that we can correctly find all
the objects that we store in the hash table, for example.
So, there is a Lemma about running time of operations with hash table if we use universal family. If
hash function h is chosen at random from a universal family then on average the length of the longest
chain in our hash table will be bounded by O(1 + alpha), where alpha is the load factor. Load factor is
the ratio of number of keys that we store in our hash table to the size of the hash table allocated.
Play video starting at 4 minutes 22 seconds and follow transcript4:22
Which is the same as the chronology of the hash functions in the universal family that we use. So, it
makes sense. If the load factor is small it means that we only store a few keys in a large hash table,
and so longest chain will be short.
Play video starting at 4 minutes 38 seconds and follow transcript4:38
But as our table gets filled up, the chains grow. This Lemma says, however, that if we chose a random
function from a universal family they won't grow to much. On average, the longest chain will still be of
length just (1 + alpha). And probably that is just a small number because alpha
Play video starting at 4 minutes 59 seconds and follow transcript4:59
is usually below one, you don't want to store more keys in the hash table than the size of the hash
table allocated. So alpha will be below 1 most of the time and then (1+ alpha) is just two, so this is a
constant actually. So, the corollary is that if h is chosen at random from the universal family, then
operations with hash table will run on average in a constant time.
Play video starting at 5 minutes 24 seconds and follow transcript5:24
Now the question is, how to choose the size of your hash table? Of course, it control the amount of
memory used with m which is your chronology of the hash functions and which is equal to the size of
the hash table. But you also control the speed of the operations. So ideally, in practice, you want your
load factor alpha to be between 0.5 and 1. You want it to be below 1 because otherwise you store too
much keys in the same hash table and then everything could becomes slow. But also you don't want
alpha to be too small because that way you will waste a lot of memory. If alpha is at least one-half,
then you basically use linear memory to store your n keys and your memory overhead is small. And
operations still run in time, O(1 + alpha) which is a constant time, on average if alpha is between 0.5
and 1.
The question is what to do if you don't know in advance how many keys you want to store in your
hash table. Of course, there is a solution to start with a very big hash table, so that definitely all the
keys will fit. But this way you will waste a lot of memory. So, what we can do is copy the idea you
learned in the lesson about dynamic arrays. You start with a small hash table and then you grow it
organically as you put in more and more keys. Basically, you resize the hash table and make it twice
bigger as soon as alpha becomes too large. And then, you need to do what is called a rehash. You
need to copy all the keys from the current hash table to the new bigger hash table. And of course, you
will need a new hash function with twice the chronology to do that. So here is the code which tries to
keep loadfFactor below 0.9. And 0.9 is just a number I selected, you could put 1 here or 0.8, that
doesn't really matter. So first we compute the current loadFactor, which is the ratio of the number of
keys stored in the table to the size of the hash table. And if that loadFactor just became bigger than
0.9, we create a new hash table of twice the size of our current hash table. We also choose a new
random hash function from the universal family with twice the cardinality coresponding to the new
hash table size. And then we take each object from our current hash table, and we insert it in the new
hash table using the new hash function. So we basically copy all the keys to the new hash table. And
then we substitute our current hash table with the bigger one and the current hash function with the
hash function corresponding to the new hash table. That way, the loadFactor decreases roughly
twice. Because we added, probably just added one new element, the loadFactor became just a little
more than 0.9. And then we increase the size of the hash table twice while the number of keys stayed
the same, so the loadFactor became roughly 0.45, which is below 0.9, which is what we wanted.
So to achieve that, you need to call this procedure rehash after each operation which inserts
something in your hash table. And it could work slowly when this happens because the rehash
procedure needs to copy all the keys from your current hash table to the new big hash table, and that
works in linear time. But similarly to dynamic arrays, the amortized running time will still be constant
on average because their hash will happen only rarely. So you reach a certain level of load factor and
you increase the size of our table twice. And then it will take twice longer to again reach too high
value of load factor. And then you'll again increase your hash table twice. So the more keys you put in,
the longer it takes until the next rehash. So their hashes will be really rare, and that's why it won't
influence your running time with operations, significantly.
Hashing Integers
So, for example, we selected hash function corresponding to a = 34 and b = 2, so this hash function h
is h index by p, 34, and 2.
Play video starting at 4 minutes 23 seconds and follow transcript4:23
And we will compute the value of this hash function on number 1,482,567 because this integer
number corresponds to the phone number who we're interested in which is 148-2567. Well,
remember that p that we chose is a prime number 10,000,019. So first, we multiply our number x by
34 and add 2, and after that, we take the result modulo b, modulo 10,000,019, and the result is
407,185. Then we take this result and take it again modulo 1,000, and the result is 185. And so the
value for our selected hash function on number x is 185. And for any other number x, you would do
the same, you would multiply x by 34, add 2, take the result modulo b, then take the result modulo
1,000. And so any value of our hash function is a number between 0 and 999 as we want.
Play video starting at 5 minutes 35 seconds and follow transcript5:35
And if we do different a and b, instead of 34 and 2, we'll just multiply x by different a, add different b.
Take a modulo b, take the result modulo m, and get the value for our hash function.
Hashing Integers
Hi, in the previous video, you've learned the concept of universal family of hash functions and you
learned how to use it to make operations with your hash table really fast.
Play video starting at 11 seconds and follow transcript0:11
However, now we need to actually build a universal family and you will start with a universal family
for the most important object which is integer number. Because any object on your computer is
represented as a series of bits or bytes, and so you can think of it as a sequence of integer numbers.
And so first, we need to learn to hash integers efficiently. So we will build a universal family for
hashing integers. But we will look at our example with phone numbers because we need to store
contacts in our phone. So first, we will consider only phone numbers up to length seven and for
example we will consider phone number 148-2567. And again, we'll convert all of those phone
numbers, we want to start from integers from zero to the number consisting of seven nines. And for
example, our selected phone number will convert to 1,482,567. And then we will hash those integers
to which we convert our phone numbers. So to hash them, we will need to also choose a big prime
number, bigger than 10 to the power of 7, for example, 10,000,019 is a suitable prime number. And
we will also need to choose the hash table size which is the same as the chronology of the hash
function that we need. So now that we selected p and m, we are ready to define universal family for
integers between 0 and 10 to the power of 7 minus 1.
So the Lemma says that the following family of hash functions is a universal family.
Play video starting at 2 minutes 0 seconds and follow transcript2:00
What is this family? It is indexed by p, p is the prime number, 10,000,019, in this case that we choose.
Play video starting at 2 minutes 9 seconds and follow transcript2:09
And it also has parameters a and b, so those parameters are different for different hash functions in
these family. Basically, if you fix a and b, you fix a hash function from this hash functions family,
calligraphic H with index p. And x is the key, it is the integer number that we want to hash, and it is
required that x is less than p. It is from 0 to p minus 1, or less than p minus 1, but definitely, it is less
than p. So, to create a value of this integer x with some hash function, we first make a linear
transform of this x. We multiply it by a, corresponding to this hash function, and add b, corresponding
to this hash function. Then we take the result, modulo our big prime number p.
Play video starting at 3 minutes 9 seconds and follow transcript3:09
And after that, we again take the result modulo the size of our hash table or the chronology of the
hash functions that we need. So all these hash functions indexed by a and b will have the same
chronology m.
Play video starting at 3 minutes 25 seconds and follow transcript3:25
And the size of this hash family, what do you think it is?
Play video starting at 3 minutes 31 seconds and follow transcript3:31
Well, it is equal to b multiply by p minus 1, why is that? Because there are p minus 1 variance for a,
and independently from that, there are p variance for b. So the total number of pairs, a and b, is p
multiplied by p minus 1, that is the size of our universal family. And the Lemma states that it really will
be a universal family for integers between 0 and p minus 1. We will prove this Lemma in a separate,
optional video. And here, we'll look at an example of how this universal family works.
So, for example, we selected hash function corresponding to a = 34 and b = 2, so this hash function h
is h index by p, 34, and 2.
Play video starting at 4 minutes 23 seconds and follow transcript4:23
And we will compute the value of this hash function on number 1,482,567 because this integer
number corresponds to the phone number who we're interested in which is 148-2567. Well,
remember that p that we chose is a prime number 10,000,019. So first, we multiply our number x by
34 and add 2, and after that, we take the result modulo b, modulo 10,000,019, and the result is
407,185. Then we take this result and take it again modulo 1,000, and the result is 185. And so the
value for our selected hash function on number x is 185. And for any other number x, you would do
the same, you would multiply x by 34, add 2, take the result modulo b, then take the result modulo
1,000. And so any value of our hash function is a number between 0 and 999 as we want.
Play video starting at 5 minutes 35 seconds and follow transcript5:35
And if we do different a and b, instead of 34 and 2, we'll just multiply x by different a, add different b.
Take a modulo b, take the result modulo m, and get the value for our hash function.
So in the general case, when the phone numbers can be longer than seven, we first define the
maximum allowed length, L, of the phone number. And again, convert all the phone numbers to
integers which will derive from 0 to 10 to the power of L- 1, and then we'll hash those integers. To
hash those integers, we'll choose a sufficiently large number p, p must be more than 10 to the power
of L for the family to be universal. Because otherwise, if we take some p less than 10 to the power of
L, there will exist two different integer numbers between 0 and 10 to the power of L- 1, which differ
by exactly p. And then, when we compute the value of some hash function on both those numbers
and we take linear transformation of those keys, modulo b, the value of those transformations will be
the same. And then when we take, again, module m, the value again will be the same. And that
means that for any hash function from our family, the value of its function on these two keys will be
the same. So there will be a collision for any hash function from the family, but that contradicts the
definition of universal family. Because for a universal family and for two fixed different keys, no more
than 1 over m part of all hash functions can have collision for these two keys. And in our case, all hash
functions have a collision for these two keys, so this is definitely not a universal family. So we must
take p more than 10 to the power of L, and in fact, that is sufficient. Then, we choose hash table of
size m, and then we use our universal family, calligraphic H with index p. We choose a random hash
function from this universal family, and to choose a random hash function from this family, we need
to actually choose two numbers, a and b. And a should be a random number between 1 and p-1, and
b should be an independent random number from 0 to p-1. If we selected those two numbers, we
define our hash function completed.
Play video starting at 7 minutes 58 seconds and follow transcript7:58
So now we know how to solve the problem of phone book in the direction from phone numbers to
names. So we first define the longest allowed length of the phone number. We convert all the phone
numbers to integers from 0 to 10 to the power of L -1.
Play video starting at 8 minutes 18 seconds and follow transcript8:18
We choose a big prime number, bigger than 10 to the power of L. We choose the size of the hash
table that we want based on the techniques you learned in the previous video and then you add the
context to your phone book as a hash table of size m. Hashing them by a hash function randomly
selected from the universal family, calligraphic H with index p. And that is the solution in the direction
from phone numbers to names. This solution will take bit of m memory, and you can control for m,
and it will work on average in constant time if you select m wisely using the techniques from the
previous video.
Play video starting at 9 minutes 5 seconds and follow transcript9:05
And now we also need to solve our phone book problem in the different direction, from names to
phone numbers. And that we will do in the next video.
Hashing Strings
Hi, in the previous videos, you've learned how to quickly look up name in your phonebook given the
phone number. And we want to learn to solve the reverse problem given a name, look up a phone
number of the corresponding person.
Play video starting at 17 seconds and follow transcript0:17
To do that, we need to implement the Map from names to phone numbers.
Play video starting at 22 seconds and follow transcript0:22
And we can again use hash tables and we can again use chaining as in the previous sections. But we
need to design a hash function that is defined names. And more generally, we want to learn to hash
arbitrary strings of characters. And by the way in this video, you will also learn how hashing of strings
and implemented in the Java programming language. But first, let's introduce a new notation. Denote
by lSl enclosed in vertical lines the length of string S. For example, the length of string l"a"l is 1, length
of string l"ab"l is 2, and length of string l"abcde"l is 5. So now how do hash strings? Well when we're
given a string, we're actually given a sequence of characters from S[0] to S length of S- 1. We number
the characters of the strings from 0 in this lecture. And S[i] Is an individual character that is in the i-th
position in the string.
Play video starting at 1 minute 29 seconds and follow transcript1:29
I say that we should use all the characters when we compute our hash function of a string. Indeed, if
we don't use the first character, there will be many collisions. For example, if the first symbol of the
string is not used, then the hash value of strings ("aa"), ("ba") and so on, up to ("za") will be the same.
Because however we compute the value of the hash function, it doesn't use the value of the first
character. And if everything else in the strings stays the same, and we only change the first character
that doesn't influence the value of the hash function then the value of the hash function must be the
same. And so there will be a lot of collisions and we want to avoid collisions. So we need to use value
of each of the characters.
Play video starting at 2 minutes 17 seconds and follow transcript2:17
Now, we could do a lot of things with that. For example, sum the values of all the characters or
multiply them, but we'll do something different.
It is a polynomial sum where we multiply the integer quote corresponding to the ith character of S,
which is noted by S of i, the same as the character itself. We multiply it by x to the power of i. We sum
all these things up, and we take the value modular p. So this is a family of hash functions, and the
chronology of all those hash functions is p. So any such hash function returns value from 0 to p- 1.
And how many hash functions are there in this family? Well of course, there are exactly p- 1 different
hash functions, because to choose to define a hash function from this family you would just need to
choose the value of x. And x changes from 1 to p- 1, and it's an integer number of course.
So how can we implement a hash function from this family?
Play video starting at 4 minutes 24 seconds and follow transcript4:24
S, the procedure PolyHash which takes it's input string S, prime number p and parameter x,
implements the hash function from our peril. It starts with the signing values of 0 to the result to the
hash value will return to end. And then it will go from right to left in our string and compute new
value based on the value of the corresponding character. And there is a formula in the code that does
exactly that. And I will show you by example that what we get in the end by applying this formula is
exactly what we want. So basically, we start with a hash value of 0, and then we start with i equal to 2
if the length of our string S is 3. We start with length of S- 1 which is 2. We have current value of hash
= 0. So we multiply the 0 by x and get 0, then we add the value of S[i] which is S[2], and take it mod p.
And so after first iteration of the for loop, we get S[2] mod p.
Play video starting at 5 minutes 33 seconds and follow transcript5:33
What happens is the next iteration, that i is decreased and i is now 1. And we multiply the current
value S[2] by x. And we add s[1], and take everything modular p. And what we get is the same as of
S[1] + S[2] multiply by x modular p. And then the last iteration, i is decreased to 0. We multiply the
current value by x. What we get is S[1] multiply by x + S[2] multiply by x squared. And then we also
add S[0] to the sum and take everything modular p. And the result is S[0] + S[1] multiply by x + S[2]
multiply by x2, exactly as we wanted. A polynomial hash function, with prime P and prime parameter
x.
Play video starting at 6 minutes 28 seconds and follow transcript6:28
And by the way, the implementation of the built in hash code methods in the class stream in Java, is
very similar to our procedure PolyHash. The only difference is that, it always uses x = 31. And for some
technical reasons, it avoids the modular p operator It just computes the polynomial sum without any
modular division. So now you know how a function that is used probably trillions of times a day by
thousands and many thousands of different programs, how this function is implemented.
Play video starting at 7 minutes 4 seconds and follow transcript7:04
So now about the efficiency of our polynomial family.
Play video starting at 7 minutes 9 seconds and follow transcript7:09
First, Lemma says that for any two different strings s1 and s2 of length at most L + 1. If you choose a
random hash function from the polynomial family by selecting a random value of x, parameter x from
1 to p- 1. You can select a random hash function from the family. So if you select a random hash from
the polynomial family, then the probability of collision on these two different strings is at most L
divided by p.
Play video starting at 7 minutes 45 seconds and follow transcript7:45
So that doesn't seem like a good estimate because L can be big, but actually it is your power to choose
p. If you choose very, very big prime number p then L over p will be very small. And know that it won't
influence the running time of the PolyHash procedure, because the running time of this procedure is
big length of S. It only depends on the length of the string. It doesn't depend on the length of number
p more or less. So if you select a really big number p, then the probability of collision will be very small
and the hash function will still be computed very fast. The idea of proof of this Lemma is that the
equation polynomial equation of power L, modular prime number p has at most L different solutions
x. Basically, when we consider two strings S1 and S2. The fact that the hash value or some hash
function from the polynomial family is the same for these two strings means that x corresponding to
our hash function is a solution of this kind of equation. And the fact that strings are different makes
sure that at least one of the coefficients of this equation is different from 0, and that is essential. If the
strings were the same of course, the value of any hash function on them will be the same. But if
they're different then the probability is at most L over p. Because there are only L or less different x
for which the hash function can give the same value on these two strings.
Then the probability of collision for random function from that family is at most 1 over m + L over p.
So, that is not an universal family because for a universal family there shouldn't be any summon L
over p the probability of collision should be at most 1 over M. But we can be very, very close to
universal family because we can control P. We can make P very big. And then L over p will be very
small. And so, the probability of collision will be at most will 1 over m plus some very small number.
And so, it will be either even less than 1 over m or very close to it. So 1 mL hash, and then universal
hash for integers is a good construction of a family of hash functions.
Play video starting at 3 minutes 33 seconds and follow transcript3:33
A Corollary from the previous Lemma is that, if we specifically select the prime number p to be bigger
than m multiplied by L, then the probability of collision will be, at most of 1 over m, so it won't be less
than 1 over m itself, but it will be at most 1 over m multiplied by some constant. Why is that? Well,
because if we rewrite 1 over m plus L over p by 1 over m + L over mL. Then the second expression will
be bigger because P is bigger than mL.
Play video starting at 4 minutes 12 seconds and follow transcript4:12
And then it is equal is 2 over m which is big O(1 over m). So that way, we proved that combination of
polynomial hashing with universal hashing for integers, is a really good family of hash functions.
Now what if we take this new family of hash functions and apply it to build a hash table?
Play video starting at 4 minutes 34 seconds and follow transcript4:34
Well, I say that for big enough prime number p, we'll again have running time on average c=O(1 + a).
The length of the longest chain will be O(1 + a). Where alpha is the lowest factor of our hash table.
And so, by wisely controlling this size of the hash table on the lowest factor as we learned in the
previous videos. We can control the running time and the memory consumption.
Play video starting at 5 minutes 3 seconds and follow transcript5:03
Of course, computing the hash function itself on sum string s is not a constant time operation,
because the string can be very long. And we need to look through the whole string to compute our
hash function. But in the case when the lengths of the strings in question are bounded, like for
example with names definitely there are no names longer than a few hundred characters I think, so all
they are bounded by some constant L. And so computing hash function on the names, tags, of course
go off the length of the stream time, but it is also we go off constant time because L is a constant
itself, and so we can implement a map from names to phone numbers using chaining, using the newly
created family of hash functions, which is complex. It first applies polynomial hashing to the stream,
to the name, and then applies universal family or integers to the result. So we can choose a random
hash function from this two staged family. And store our names, and phone numbers in the hash
table, using this hash function.
Play video starting at 6 minutes 18 seconds and follow transcript6:18
In conclusion, you learned how to hash integers, and strings, really good, so that probability of
collision is small. You learned that a phone book can be implemented as two maps, as two hash
tables, one from phone numbers to names, and another one back, from names to phone numbers.
And if you manage to do that in such a way you don't waste too much memory where all factors of
your hash table is between one five and one, and search and modification, on average, work in
constant time, which is great. And then the next lesson. We'll learn to apply hash functions to
different problems such as searching for patterns in text.
Slides and External References
Slides
Download the slides for this lesson:
07_hash_tables_2_hashfunctions.pdfPDF File
07_hash_tables_2_proof_universal_family.pdfPDF File
References
See the chapter 1.5 in [DPV] Sanjoy Dasgupta, Christos Papadimitriou, and Umesh Vazirani.
Algorithms (1st Edition). McGraw-Hill Higher Education. 2008.
See the chapter 11.3 in [CLRS] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest,
Clifford Stein. Introduction to Algorithms (3rd Edition). MIT Press and McGraw-Hill. 2009.
QUIZ • 30 MIN
Hi. In this lesson, you will learn about applications of hashing to problems regarding strings and texts.
We will consider the problem of finding patterns in text. The problem is, given a long text T, for
example a book or a website or a Facebook profile, and some pattern P which can be a word, a
phrase, a sentence. Find all occurrences of pattern in the text. Some examples of that can be that you
want to find all occurrences were name on the website or you want to find all the Twitter messages
about your company to analyze the reviews of your new product. Or, you could potentially want to
detect all the files in your computer which are infected by specific computer virus and in that case you
won't find letters in text, you will find code patterns in the binary code of the program.
Play video starting at 50 seconds and follow transcript0:50
Anyway the algorithm will be the same.
Play video starting at 53 seconds and follow transcript0:53
First we introduce some new notations, substring notation, we denote by S from I to J the substring of
string S, starting in position I and ending in position J. Both I and J are included in the substring. For
example, if S is the string ABCDE, then S from zero to four is the same string ABCDE because we index
our characters from zero and A is the character number zero and E is the character number four.
Play video starting at 1 minute 24 seconds and follow transcript1:24
S from one to three is bcd because b is the character with index one and d is the character with index
three. And S from two to two is also allowed. It's a sub-string of length one, string c. And I shouldn't
be more than J of course because otherwise there is no sub-string from I to J.
Play video starting at 1 minute 46 seconds and follow transcript1:46
So, the formal version of our problem to find pattern in text is that you're given strings T and P as
input and you need to find all such positions I in the text T
Play video starting at 2 minutes 1 second and follow transcript2:01
that pattern P occurs in text T starting from position I. That is the same that to say that a substring of t
from I to I plus length of T minus one, the substring of T starting from I with length equal to the length
of the pattern is equal to the pattern. So we want to find all such positions i and, of course, i can be
from zero to length of text minus length of pattern. It cannot be bigger because otherwise the pattern
just won't fit in the text,
Play video starting at 2 minutes 37 seconds and follow transcript2:37
it will be ending to the right from the end of the text.
Play video starting at 2 minutes 42 seconds and follow transcript2:42
So we've start with a naive algorithm to solve this problem. Physically we go through all possible
positions, i from zero to difference of the length of the text and pattern. And then for each such
position I would just check character by character, whether the corresponding sub string of T starting
in position number I is equal to the pattern or not. If it is equal to the pattern we advance position I to
the result.
Play video starting at 3 minutes 11 seconds and follow transcript3:11
First we need to implement a function to compare two strings and we start with checking whether
their lengths are the same or not of cvourse if the lengths of strings is different then the strings are
definitely difference. If that's not the case, then the length of the strings are equal. And then we go
through all the positions in both strings with I going from zero to length of the first string minus one.
And if the corresponding symbols on the ith position differ, then the strings are different. Otherwise
they are the same.
Play video starting at 3 minutes 47 seconds and follow transcript3:47
Now we will use this function to find our occurrence of pattern in the text.
Play video starting at 3 minutes 53 seconds and follow transcript3:53
The procedure find pattern naive implements our naive algorithm. So let's start with an empty list in
the variable result and then we'd would go through all the possible positions where pattern could
start with X for I from zero to lines of text minus length of the pattern and we check whether the
substring starting in I with length equal to length of the pattern is equal to the pattern itself. If it is,
then we append position I to the result because this is a position where pattern occurs in text and
then, we just return the list that we collected by going through all possible positions of pattern in the
text. I'd say that the running time of this naive algorithm is big O of length of the text multiply by
length of the pattern.
Play video starting at 4 minutes 46 seconds and follow transcript4:46
Why is that? Well, each call to the function AreEqual, runs in time big O, of length of the pattern,
because both strings we pass there,
Play video starting at 4 minutes 57 seconds and follow transcript4:57
are of lengths, the same as the length of the pattern. And, the running time of AreEqual, is linear.
Play video starting at 5 minutes 5 seconds and follow transcript5:05
And then we have exactly
Play video starting at 5 minutes 8 seconds and follow transcript5:08
length of T minus length of P plus one calls of this function, which total to big O of length of T
multiplied by length of P, because we always consider that length of the text is bigger than the length
of the pattern, and so this is the upper bound for our running time.
Actually, this is not just the upper bound, it's also lower bound.
Play video starting at 5 minutes 33 seconds and follow transcript5:33
For example, consider, text T, which consists of many, many letters, a, and pattern P, which consists of
many, many letters a, and then letter b in the end, and also
Play video starting at 5 minutes 47 seconds and follow transcript5:47
we choose such text that it is much longer than the pattern which is basically
Play video starting at 5 minutes 53 seconds and follow transcript5:53
almost always true in the practical problems. For each position i and t which we try to observe the
goal to our equal to make has to make all of the maximum possible number of comparisons which is
equal to the length of the pattern B. Why is that? Because one would call our equal for substring of T
starting in position I and for the pattern B. We see that they differ only in the last characters so our
equal has to check all of the previous characters until it comes to the last character of P and
determines that actual pattern is different from the corresponding substring of D. Last in this case the
naive algorithm will do at least proportional to length of T multiplied by length of T operations. That's
our estimate is not just big O, it is big letter which means that it is not only in upper bound but also a
lower bound on the writing time on the naive algorithm. In the next video we will introduce an
algorithm based on hashing which has better running time
Rabin-Karp's Algorithm
Hi, in this video, we'll introduce Rabin-Karp's Algorithm for finding all occurrences of a pattern in the
text. At first it will have the same running time as the Naive Algorithm from the previous video. But
then we'll be able to improve it significantly for the practical purposes. So we need to compare our
pattern to all substrings S of text T, with length the same as the length of the pattern. And in the
Naive algorithm, we just did that by checking character by character whether pattern is equal to the
corresponding substring. And the idea is we could use hashing to quickly compare P with substrings of
T. So, how to do that? Well, let's introduce some hash function h and of course if it is a deterministic
hash function. And we see that the value of hash function on the pattern P is different from the value
of this hash function on some string S. Then definitely P is not equal to S, because h is deterministic.
Play video starting at 1 minute 10 seconds and follow transcript1:10
However if the value of hash function on P is equal to the value of hash function on S, P can be equal
to S or it can be different from S if there is a collision. So to exactly check whether P is equal to S or
not we will need to call our function AreEqual(P,S). And so this doesn't yet save us any time. But we
hope that we could call this function AreEqual less frequently because there will be only few
collisions. So we'll use polynomial hashing family. Polygraphic P with index p small with some big
prime number p. And if P pattern is not equal to S substring of text, then the probability that the value
of the hash function on the pattern is the same as the value of hash function on the sub string is at
most length of the pattern divided by our big prime number p.
And we'll choose, a prime number P big enough, so that this probability will be very small. So here is
the code, of our algorithm RabinKarp. It takes its input, text T, and pattern P.
Play video starting at 2 minutes 33 seconds and follow transcript2:33
And it starts by initializing the hash function from polynomial family. We first choose a very big prime
number p. We'll talk later about how to choose it, how big it should be. And we also choose a random
number x between 1 and p- 1. Choose the specific hash function from the polynomial family.
Play video starting at 2 minutes 57 seconds and follow transcript2:57
Initialize all our list of positions where pattern occurs in text with an empty list.
Play video starting at 3 minutes 4 seconds and follow transcript3:04
We also precompute the hash value of our pattern, and we call the PolyHash function to do that.
Play video starting at 3 minutes 13 seconds and follow transcript3:13
And then we again need to go through all possible starting positions of pattern and text. So we go
from i from zero to difference of the length of text and pattern.
Play video starting at 3 minutes 25 seconds and follow transcript3:25
And for each i, we take the substring starting in this position i and of length equal to the lengths of the
pattern, which is t from i to i plus length of the pattern minus 1. And you compute the hash value of
this substring. And then we'll look at the hash of the pattern and the hash of the substring. If they are
different, then it means that definitely, P is not equal to this substring. And so, P doesn't occur in
position i and so we don't need to do anything in this iteration so we just continue to the next
iteration of the loop without calling AreEqual. However, if has values pHash and tHash aren't equal,
then we need to check if it's true that P is really equal to the substring of T starting in position i or it is
just a collision of our hash function. And to do that we make a call to AreEqual and pass there the
substring and the pattern. If AreEqual returns true, it means that pattern is really equal to the
correspondence substring of texts, and then we advance position i to resolve. Because pattern P
occurs in position i in the text T. Otherwise we just continue to the next situation of our for loop. So
this more or less the same as naive algorithm, but we have an additional checking of hash value, and
so we're not always calling AreEqual. We are calling AreEqual either if P is equal to the corresponding
sub string of T or if there is a collision. Let's estimate the running time of this algorithm.
So first we need to talk about false alarms. We'll call false alarm the event when P is compared with a
substring of T from i to i plus length of P minus 1. Compared inside the AreEqual procedure, but
pattern P is actually not equal to this substring. So there's a false alarm in the sense that P doesn't
occur in the text T starting from position i, but we still called the AreEqual function. And we need to
go character by character through P and the substring to test that they're actually not equal. So the
probability of false alarm as we know from the previous lesson, is at most length of the pattern over
prime number P, which we choose. So on average, the total number of false alarms will be the
number of iterations of our for loop, multiplied by this probability.
Play video starting at 6 minutes 9 seconds and follow transcript6:09
And so this total number of false alarms can be made very small if we choose prime number P, bigger
than the product of length of the text, and length of the pattern.
Much bigger. So now let's estimate the running time of everything in our code except for calls to the
AreEqual function. So the hash value of the pattern is computed in time big O of length of the pattern.
Play video starting at 6 minutes 39 seconds and follow transcript6:39
Hash of the substring corresponding to the pattern is computed in the same big O of length of the
pattern time. And this is done length of text minus length of the pattern plus 1 times because that is
the number of iterations of the for loop.
Play video starting at 6 minutes 53 seconds and follow transcript6:53
So the total time to compute all those hash values is big O of length of text multiplied by the length of
the pattern.
Play video starting at 7 minutes 3 seconds and follow transcript7:03
Now what about the running time of all calls to AreEqual? Each call to AreEqual is computed in big O
of length of the pattern because we pass there are two strings of length equal to length of the
pattern.
Play video starting at 7 minutes 17 seconds and follow transcript7:17
However, AreEqual is called only when the hash value of the pattern as the same as the hash value of
the corresponding substring of T. And that means that either P occurs in position i in text T or there
was a false alarm.
Play video starting at 7 minutes 34 seconds and follow transcript7:34
And by selecting the prime number to be very big, much bigger than the product of the length of text,
and the length of pattern, we can make the number of false alarms negligible, at least on average. So,
if q is the number of times that pattern P is actually found, in different positions in the text T, then the
total time spent in AreEqual, on average, is big O of q. Which is number of times P is really found, plus
the fraction T minus P plus 1 multiplied by P and divided by prime, p. Which is the average number of
times that a false alarm happens. So q plus number of false alarms is the number of times that we
need to actually call function AreEqual. And then the time spent inside the function AreEqual is
proportional to the length of the pattern.
Play video starting at 8 minutes 33 seconds and follow transcript8:33
So, this is the same as the O of q multiplied by the length of the pattern, because the second
summoned can be made pretty small, less than 1 if we choose big enough prime number p. And we'll
only get the first summoned multiplied by the length of the pattern.
Play video starting at 8 minutes 52 seconds and follow transcript8:52
And now the total running time of the Rabin-Karp's algorithm in this variant is big O length of text
multiplied by length of pattern plus q multiplied by the length of pattern. But, of course we know that
the number of times that pattern occurs in text is not bigger than the number of characters, in text.
Because there are only so many different positions where the pattern could start, in text. So, this sum
is dominated by the sum of big O of length of text, multiplied by length of the pattern.
Play video starting at 9 minutes 24 seconds and follow transcript9:24
So, this is basically the same running time as our estimate for the naive algorithm. So we haven't
improved anything yet, but this time can be improved for this algorithm with a clever trick. And you
will learn it in the next video.
Optimization: Precomputation
Hi, in this video you will learn to significantly improve the running time of the Rabin-Karp's Algorithm.
And to do so we'll need to look closer into the polynomial hashing and its properties. Recall that to
compute a polynomial hash on the string s but first choose a big prime number for the polynomial
family, then we choose a random integer x from 1 to p minus 1 to select a random hash function from
the family. And then the value of this hash function is the polynomial of x with coefficients which are
characters of the string S.
Play video starting at 32 seconds and follow transcript0:32
And to compute this hash functional substring of text T starting in position i and having the same
length as the pattern for which we are looking in the text. We need to also compute a similar
polynomial sum. It goes from character number i to character number i plus length of the pattern
minus 1. And we need to multiply each character by the corresponding power of x. For example T of i
will be multiplied by x to the power of zero because this is the first character of the substring and the
last character will be multiplied by x to the power length of the pattern minus 1, and here is a formula
on the slide. And the idea for the improving of the running time is that the polynomial hash value for
two consecutive substrings of text with length equal to the length of the pattern are very similar and
one of them can be computed given another one in constant time. We introduce a new notation, we
denote by H[i]. The hash value for the substring of the text starting in position i and having the same
length as the pattern.
Play video starting at 1 minute 40 seconds and follow transcript1:40
Now let's look at the example, our text is a, b, c, b, d.
Play video starting at 1 minute 45 seconds and follow transcript1:45
And we need to convert the characters to their integer codes. And let's assume for simplicity that the
code for a is zero, for b is one, for c is two, and for d is three. Then our text is actually 0, 1, 2, 1, 3.
Play video starting at 2 minutes 1 second and follow transcript2:01
Also, we will assume in this example, that the length of the pattern is three. We don't need to know
the pattern itself, we just fix its length.
Play video starting at 2 minutes 9 seconds and follow transcript2:09
So we will need to computer hash values for the substrings of the text of length three. There are three
of them, abc, bcd, and cbd. We start with the last one, cbd. To compute its hash value, we first need
to write down the powers of x under the corresponding characters of the text.
Play video starting at 2 minutes 27 seconds and follow transcript2:27
Then we need to multiply each power of the x by the corresponding integer code of the character and
we get 2x and 3x squared. And then we need to sum them and we also need to take the value module
of b, but on this slide we'll just ignore module of p, it will be assumed in each expression. Now let's
look at the hash value for the previous substring of lines three which is bcb. We again need to write
down the powers of x under the corresponding integer codes of the character. And again need to
multiply the powers of x by the corresponding integer codes and get one to x and x squared, we need
to sum them. Now note the similarity between the hash value for the last substring of line three and
the previous substring of line three. To get the last two terms for bcb, we can multiply the first two
terms for cdb by x.
Play video starting at 3 minutes 23 seconds and follow transcript3:23
And we will use this similarity to compute the hash for bcb given the hash for cdb. So again H[2] is the
same as hash value of cbd because it starts in the character with index two and it's equal to 2 + x + 3x
squared.
Play video starting at 3 minutes 40 seconds and follow transcript3:40
Now let's compute the age of 1 based on that this is the hash value of bcb and we know it's equal to 1
+ 2x + x squared module of p.
Play video starting at 3 minutes 50 seconds and follow transcript3:50
Now let's rewrite this using this property of multiplication by x the terms for the cbd.
Play video starting at 3 minutes 57 seconds and follow transcript3:57
So it's equal to 1 + x multiplied by the first two terms for cbd which are 2+x. Now we don't want to
use just the first two terms for cbd, we. We want to use the whole cbd so we write this as following
1+ x multiplied by the whole expression for cbd but now we need to subtract something to make the
equality true. And that something is the last term, x multiplied by 3x squared, which is the same as 3x
cubed, so we subtract 3x cubed.
Play video starting at 4 minutes 30 seconds and follow transcript4:30
Now we regroup the summons, and we right as this is equal to x multiplied by the hash value for cbd
which is big H[2], we add 1 to it and we subtract through 3x cubed.
In the general case, there is a very similar formula. So, here is the expression for big H[i + 1], and
notice that the powers of x are, in each case j- i- 1 because the substring starts in position i plus one.
So, we subtract i + 1 from each j in the sum, and the expression for big H[i] is very similar. But, for
each power of x, we subtract just i from j. Because the substring starts in position i. Now let's rewrite
this expression so that it is more similar to the gauge of i + 1. And to do that, we start summation not
from i, but from i + 1 and also end it one position later. So, the first sum is now very similar to the
expression for H[i+1], which has the powers of x are always bigger by one. And also we need to add
T[i] which is not accounted for in the sum, and we need to subtract its last term, because it's not In
the expression for big H[i]. And that is T[i] plus length of the pattern, multiplied by x to the power of
length of the pattern. Now we notice that the first sum is the same as x multiplied by the value of
hash function for the next substream, big H[i+1]. And the second and third terms are the same. So
now we get this recurrent formula. To compute the gauge of i, if we know already the gauge of i + 1,
we need to multiply it by x and then add T[i] and subtract another term. Notice that T[i] and T of i plus
length of the pattern we just know. And x to the length of the pattern is a multiplier that we can pre
compute and use for each i.
Play video starting at 6 minutes 38 seconds and follow transcript6:38
Now let's use this in the pseudo code. Here's the function to pre compute all the hash values of our
polynomial hash function on the substrings of the text t with the length equal to the length of the
pattern, and with prime number, P and selected integer x. We initialize our answer, big H, as an array
of length, length of text minus length of pattern plus one. Which is the number of substrings of the
text with length equal to the length of the pattern. Also initialize S by the last substring of the text
with a length equal to the length of the pattern. And you compute the hash value for this last
substring directly by calling our implementation of polynomial hash with the substring prime number
P and integer x.
Play video starting at 7 minutes 28 seconds and follow transcript7:28
Then we also need to precompute the value of x to the power of length of the pattern and store it in
the variable y. To do that we need initialize it with 1 and then multiply it length of P times by x and
take this module of p. And then the main for loop, the second for loop goes from right to left and
computes the hash values for all the substrings of the text, but for the last one for which we already
know the answer. So to compute H[i] given H[i + 1], we multiply it by x. Then we add T[i] and we
subtract y, which is x to the power of length of P, by T[i + length of the pattern]. And we take the
expression module of p.
Play video starting at 8 minutes 13 seconds and follow transcript8:13
And then we just return the array with the precomputed values.
Play video starting at 8 minutes 17 seconds and follow transcript8:17
So to analyze its training time, we know that initialization of array H of s and with the
accommodations with the hash value of the last substring, I'll take time proportional to the length of
the pattern. Also pre-computation of the x to the power of length of P takes time proportional to the
length of the pattern. And the second for loop takes time proportional to length of the text minus
length of the pattern. And all and all it's big O of length of the text plus length of the pattern.
Hi, in this video we'll use the precomputed hashes from the previous video
to improve the running time of the RabinKarp cell algorithm.
And here is the pseudo code.
Actually it is very similar to the pseudo code of the initial
RabinKarp algorithm and only a few lines changed.
So again, choose a very big prime number p and we choose a random number
x from 1 to p- 1 to choose a random Hash function from the polynomial family.
We initialize the result with an positions.
Play video starting at 29 seconds and follow transcript0:29
And we compute the hash of the pattern in the variable pHash
directly using our implementation of polynomial hash.
Play video starting at 38 seconds and follow transcript0:38
And then we call the PrecomputeHashes function from
the previous video to precompute big H, an array with hash values
of all sub strings of the text with length equal to the pattern p.
We need them to check whether it makes sense to compare pattern to
a sub string if their hashes are the same.
Or maybe if their hashes are different then,
there is no point comparing them character by characte,r because it means that
pattern is definitely different from the substream.
Play video starting at 1 minute 12 seconds and follow transcript1:12
So, then our main for loop goes for
all i, starting positions for the pattern, from 0 to length of text
minus length of pattern as in the previous version of the RabinKarp's algorithm.
And the main thing that changed is that.
We compare the hash of the pattern, not with a vary
of hash functions computed on the fly, but with the pre-computed value of the hash
function for the substream starting in position i, H[i].
If they are different, it means that the pattern is definitely different from
the substream starting in position i and we don't need to compare them character
by character, so we just continue to the next iteration of the for loop.
Otherwise, if the hash value of the pattern is the same
as the hash value of the substring,
we need to actually compare them on the quality, character by character.
And to do that, we call function AreEqual for the substring and the pattern.
If they are actually equal, we append position i to the result
to the list of all the occurrences of pattern indexed.
Otherwise, we proceed to the next iteration.
And in the end, we return result,
the list of all positions in which pattern occurs in the text.
Let's analyze the running time of this version of RabinKarp's algorithm. First we compute the hash
value of the pattern in time proportional to its length. Then we call the PrecomputeHashes function,
which we estimated in the previous video around in time proportional to the sum of length of text
and the pattern.
Play video starting at 2 minutes 51 seconds and follow transcript2:51
And then the only other thing that we do is we compare the hashes, and for some of the substrings
we call function AreEqual. And we already know from the previous videos that the total time spent in
AreEqual is on average, proportional to q multiplied by length of the pattern, where q is the number
of occurrences of pattern and text. Why is that? Because we only compared pattern to a substring if
they're equal or if there's a collision. You can compare such a big prime P, that the collisions have very
low probability, and on average, they won't influence the running time. So, on average, total time
spent in AreEqual is proportional to q multiplied by length of the pattern. And then the total average
running time, is proportional to length of the text, plus q plus 1, multiplied by length of the pattern.
Play video starting at 3 minutes 51 seconds and follow transcript3:51
And this is actually much better, than the time for the algorithm, because usually q is very small, q is
the number of times you actually found pattern in text. If you are, for example, searching for your
name on a website or for infected code pattern in the binary code of the program, there will be no or
only a few places where you actually find it. And that their number is q and it is usually much, much
less than the total number of positions in the test which is length of the test. So the second sum of [q
plus 1 multiplied by 1 looks like by length of the pattern is much smaller than length of the text
multiply it by length of the pattern. And if pattern is sufficiently long, then the first summoned is also
much smaller than length of the text multiplied by length of the pattern. So we improved our running
time for most practical purposes very significantly. Of course it's only an average, but in practice, this
will work really well.
07_hash_tables_3_search_substring.pdfPDF File
References
See the chapters 32.1 and 32.2 in [CLRS] Thomas H. Cormen, Charles E. Leiserson, Ronald L.
Rivest, Clifford Stein. Introduction to Algorithms (3rd Edition). MIT Press and McGraw-Hill. 2009.
Slides
Download the slides for this lessson:
References
https://en.wikipedia.org/wiki/Distributed_hash_table
http://stackoverflow.com/questions/144360/simple-basic-explanation-of-a-distributed-hash-table-
dht
https://www.cs.cmu.edu/~dga/15-744/S07/lectures/16-dht.pdf
https://en.wikipedia.org/wiki/Consistent_hashing
programming assigment 3
PRACTICE QUIZ • 30 MIN
Hashing
Submit your assignment
Start
Programming Assignment: Programming Assignment 3:
Hash Tables
Week 5
Data Structures
Week 5
Discuss and ask questions about Week 5.
Less
Video: LectureIntroduction
7 min
Resume
. Click to resume
Video: LectureSearch Trees
5 min
Video: LectureBasic Operations
10 min
Video: LectureBalance
5 min
10 min
AVL Trees
Video: LectureAVL Trees
5 min
9 min
10 min
4 questions
Insert(3)
Insert(8)
Insert(5)
Insert(10)
Delete(8)
Insert(12)
NearestNeighbors(7)
8
10
12
3 + 12
Ppt slides
Search Trees
Ppt slides
Hello everybody, welcome back. Today, we're going to start talking about binary search trees. In
particular, we're going to talk about what the binary search tree data structure is, how it's
constructed, and the basic properties that need to be maintained.
Play video starting at 15 seconds and follow transcript0:15
So last time we came up with this idea of a local search problem, we wanted a data structure to be
able to solve it. And we know that none of the data structures we had seen up till this point were
sufficient to solve the problems that we wanted. But one maybe came closer than the others.
Play video starting at 32 seconds and follow transcript0:32
Sorted arrays were okay, in that you could actually do searches efficiently on them. But unfortunately,
you couldn't do updates in any reasonable way. But the fact that these things allowed for efficient
binary searches sort of maybe gives us a good starting point for what we're looking for.
Play video starting at 49 seconds and follow transcript0:49
So, what we should look at is, we should really see this operation of binary search. What does it
entail, and what exactly makes it work? And so we all know how a binary search works, right? So
you've got your list of numbers, you pick the one in the middle. You ask, is the thing I am looking for
bigger than this or less than this? If it's smaller, I sort of look at the middle of first half of the array,
and say, is it bigger or less than that? If it's larger, I look to the second half of the array, and ask, is it
bigger or less than that? And I sort of keep on asking these question and each time it sort of narrows
down my search space until I get an answer.
But as you'll note, sort of associated to this sort of binary search procedure is a search tree. If you sort
of consider which questions you ask. First, I ask about, is it bigger or less than seven? If it's smaller, I
ask about four. If it's bigger, I ask about 13. If I got four and said it was bigger than four, I'd then ask
about six. And I have this sort of whole tree of possibilities. Every time I ask a question it sort of splits
into two different cases.
Play video starting at 1 minute 56 seconds and follow transcript1:56
And maybe the key idea here is that if you want to do a binary search, instead of doing it on the array,
you could just have this search tree. You start at the top of the tree, at seven. And then you head
down to 4 or 13, depending on where you go, and then you keep going down until you find your
answer.
Play video starting at 2 minutes 13 seconds and follow transcript2:13
And so in some sense, the search tree is as good as the array. But while a sorted array, as we saw, was
hard to insert into, the tree is actually a lot easier to work with in that way. And it turns out this
search tree going to be the thing that allows us to implement these operations in much better way.
Play video starting at 2 minutes 33 seconds and follow transcript2:33
Okay, so what do we need to be the case for the subtree? Well, I mean, like all trees, it should have a
root node, each node should have two children. It should have a left side, which is sort of where
you're going to go when you find out that things are smaller than that. And then you have a right side,
which is where you go when things are bigger than that. So, to be a little bit more formal, the tree is
constructed out of a bunch of nodes. Each node is sort of a data type that stores a bunch of things.
Importantly, it stores a key, it stores a value that you're comparing things to. It also should have a
pointer to the parent node and a pointer to the left child and a pointer to the right child.
Play video starting at 3 minutes 9 seconds and follow transcript3:09
And to be a search tree, it needs to satisfy one very critical property. If you look at the key of a node
X, then, well, the stuff on the left should be where you're going if you do a comparison and find the
thing you're looking for is smaller than X. And that means that, all the keys stored on all the nodes in
the left subtree of x, all the descendants of its left child, need to have a smaller key than X does.
Play video starting at 3 minutes 35 seconds and follow transcript3:35
And similarly, if you found that something was bigger than X and go to the right, it had better actually
be on the right. And so, the things whose keys are larger than X need to be on the right subtree of X.
Play video starting at 3 minutes 47 seconds and follow transcript3:47
So just review this. I mean, we have this following three trees, A, B, and C. Which one of these trees
satisfy the Search Tree Property?
Play video starting at 3 minutes 58 seconds and follow transcript3:58
Well, it turns the only correct one is B, B it works out. A has this issue that up at the top you've got
this node 4 and on the left side, it has everything bigger than 4 and on the right side, it has everything
smaller than 4. And it's supposed to be the other way around, but if you switch 4's left and right sides,
everything would work out there. Now case C is a little bit more subtle. There's really only one
problem here. And that's that you have this root node which is a 5. And there's another 4, but 4 is
part of 5's right subtree. And remember, everything on the right subtree of any node has to be larger
than it. And this 4 is smaller. And so other than that one mistake, things are okay there as well. Okay,
so this is the structure. Next time we're going to talk about how to do basic operations on binary
search trees and sort of give a little bit of pseudocode for how to do these things and then we'll sort
of have a basic start for this project
Basic Operations
0
Hello everybody, welcome back. We're continuing to talk about binary search trees. And today, we're
going to talk about how to implement the basic operations of a binary search tree. So we're going to
talk about this and talk about a few of the difficulties that show up when you're trying it. Okay, so let's
start with searching. And this is sort of the key thing that you want to be able to do on the binary
search tree. And the primary operation that we're going to look at for how to do this is what we're
going to call Find.
Play video starting at 27 seconds and follow transcript0:27
Now Find is a function. What it takes is a key k and the root R of a tree. And what it's going to return is
the node in the subtree with R as the root whose key is equal to k.
Play video starting at 40 seconds and follow transcript0:40
Okay, that's the goal and the idea is pretty easy. I mean the search tree is set up to do binary searches
on. So what we're going to do is we're going to sort of start at the top of the tree. We're going to
compare 6, the thing we're searching for, to 7. 6 is less than 7 and that means since everything less
than 7 is in its left subtree, we should look in the left subtree.
Play video starting at 1 minute 3 seconds and follow transcript1:03
So we go left. We now compare 6 to 4, the root of this left subtree. 6 is bigger than 4. Everything
bigger than 4 in the place that we're looking is going to be in 4s right subtree, so we had down in that
way. We now compare 6 to 6. They're equal, and so we are done with our search.
Play video starting at 1 minute 22 seconds and follow transcript1:22
And so this algorithm is actually very easy to implement recursively. If the key of R.Key = k, then we're
done. We just return the node R at the root and that's all there is to it.
Play video starting at 1 minute 35 seconds and follow transcript1:35
Otherwise, if R.key > K, we need something less than R, so the thing we're looking for should be in the
left subtree. So we recursively run Find on k and R's left child.
Play video starting at 1 minute 49 seconds and follow transcript1:49
On the other hand, if R.key < k, we have to look in the right subtree, so we find k in the subtree of Rs
child.
Play video starting at 1 minute 57 seconds and follow transcript1:57
Okay, this works fine as long as the thing we're looking is in the tree, but what happens if we're
looking for a key that isn't there? So we're trying to find 5 in this tree. We checked it's less than 7. It's
more than 4. It's less than 6. 6 doesn't have a left child. We have a null pointer, what do we do here?
Well, in some sense we could just return some errors saying the thing you were looking for wasn't
there, but we did actually find something useful. We didn't find 5 in the tree, because its not there,
but we sort of figured out where 5 should have been in the tree if it were there.
Play video starting at 2 minutes 32 seconds and follow transcript2:32
And so, if you stop your search right before you hit the null pointer, you can actually something
useful. You find the place where k would fit in the tree
. So it makes a little bit of sense to modify this Find procedure so that if say R.key > k, then instead of
just checking the left points here, we first check to see if R actually has a left child. If Rs Left child isn't
null, we can recursively try to find k in the left subtree. But otherwise, if it is null, we'll just stop early
and return R and do something similar for the other case. And this sort of means that if we're
searching for something that's not in the tree, we at least give something close to it. Okay, so that's
one thing we can do. Another thing that we might want to do is sort of talk with adjacent elements. If
we've got some element in the tree, we might want to find the next element.
Play video starting at 3 minutes 29 seconds and follow transcript3:29
And so in particular another function we might want which we will call next. It takes a node N and
outputs the node in the same tree with the next largest key.
Play video starting at 3 minutes 39 seconds and follow transcript3:39
And maybe one way to think about this is instead of searching for every key and has we should search
the tree for something just a tiny bit bigger than that. And, now if N has a right child this is kind of
easy. The first bunch of steps lead you to the node N, and then you want to go right, because it is
bigger than N. But after you do that you just keep going left, because it's not, it's smaller than all of
these other things. They're a little bit bigger than N. And you just keep going until you hit a node
where you can't go left any further. It's left pointer is null, and that's going to be the successor.
Play video starting at 4 minutes 15 seconds and follow transcript4:15
Now, this doesn't work if N has no right child, because you can't go right from N. You also go looking
on the left side of N, doesn't work. Everything is going to be smaller there. So instead what you have
to do is you have to go up the tree. You check its parent, and if its parent is smaller than n as well, you
have to check the grandparent. You just keep going up until you find the first ancestor that's bigger
than n. And once you have that, that will actually be the successor.
Play video starting at 4 minutes 42 seconds and follow transcript4:42
So, the algorithm for next involves a little bit of case analysis. If N does not have a right child, we're
going to run this protocol we call RightAncestor, which goes up until you take the first step right?
Otherwise, we are going to return what we're going to call the LeftDescendant of Ns right child which
means you sort of go left, until you can't go left anymore.
Play video starting at 5 minutes 5 seconds and follow transcript5:05
Now both of these are easy to implement, recursively for LeftDescendant if you don't have a left child,
you're done, you return N. Otherwise you take one step left and repeat.
Play video starting at 5 minutes 17 seconds and follow transcript5:17
For RightAncestor you check to see if your parent has a larger key than you if so you return your
parent otherwise you go up a level and repeat, and just keep going until you find it. And so putting
these together that computes next.
Play video starting at 5 minutes 33 seconds and follow transcript5:33
Now it turns out that this range search operation that we talked about before, this you're given two
numbers x and y and the root of the tree and you'd like to return a list of all the nodes whose keys are
between x and y you can implement this pretty easily using what we already have.
Play video starting at 5 minutes 49 seconds and follow transcript5:49
So, the idea is, well, you want to find the RangeSearch, say, everything between 5 and 12. First thing
you do is you do a search for the first element in that range, in this case, it will be 6. Then you find the
next element, which is 7, and the next element, which is 10 and the next element is 13, it's too big so
you stop.
Play video starting at 6 minutes 11 seconds and follow transcript6:11
So the implementation is pretty easy. We create a list L that's going to store everything that we find,
Play video starting at 6 minutes 18 seconds and follow transcript6:18
we let N be what we get when we try to find the left N point x within our tree.
Play video starting at 6 minutes 24 seconds and follow transcript6:24
And then while the key of this note N that we're working at is less than y as long as the key is bigger
than x, we're going to this node to our list and then we're going to replace N by Next(N). We're just
going to iterate through these nodes until they're too big, and then we return L. Okay, so that's how
you do range search.
And nearest neighbors, you can figure it out, it's a similar idea.
Play video starting at 6 minutes 50 seconds and follow transcript6:50
Now, the interesting things are how do we do inserts and deletes? So, for insertion we want
to be given the key k and the root R of the tree and we'd like to add a node with key equal to k to our
tree.
Play video starting at 7 minutes 3 seconds and follow transcript7:03
And the basic idea is that unlike with our certain array with the tree we can just have a new
element and just have it hanging off one of our leaves. And this works perfectly well.
Play video starting at 7 minutes 13 seconds and follow transcript7:13
There is a big of a technical problem here, though. We can't just have it hang off anywhere. I
mean, three that we're inserting is smaller than seven, so it needs to be on the left side of seven. And
furthermore, there are a whole bunch of other things that needs to satisfy to keep the search
property working out.
Play video starting at 7 minutes 30 seconds and follow transcript7:30
But, fortunately for us, this find operation. If we tried to find A node that wasn't in our tree
actually did tell us where that node should belong was.
Play video starting at 7 minutes 41 seconds and follow transcript7:41
So to insert we just find our key within R, and that gives us P, and the new node that we want
should be a child of P, on the appropriate right or left side, depending on the comparison between
things.
Play video starting at 7 minutes 55 seconds and follow transcript7:55
And that's that.
Play video starting at 7 minutes 58 seconds and follow transcript7:58
A little bit more difficult is Delete.
Play video starting at 8 minutes 0 seconds and follow transcript8:00
So, here we just want to node N, we should remove it from our tree.
Play video starting at 8 minutes 5 seconds and follow transcript8:05
Now there's a problem we can't just delete the node because then its parent doesn't have a
child. It's children don't have parents, it breaks things apart. So we need to find some way to fill the
gap.
Play video starting at 8 minutes 17 seconds and follow transcript8:17
And there is a natural way to fill this gap. The point is you want to fill the gap with something
nearby in the sorted order, so you try and find the next element, X, and maybe you just take X and fill
the gap that you created by deleting this.
Play video starting at 8 minutes 32 seconds and follow transcript8:32
Unfortunately, there could be a problem. Now, X turns out because it's the next element that
often not going to have a left child, because the left child would be sort of even closer.
Play video starting at 8 minutes 44 seconds and follow transcript8:44
But it might have a right child and if it does have this right child, then by moving X side of the
way it's right child is now going to be orphaned. It's not going to have a proper parents. So in addition
to moving X to fill that gap you have to move Y up to fill the gap that you made by moving X out of the
way. But once you do that it's actually perfectly good. You've done a reasonable rearrangement tree
and removed nodes you want.
So the implementation takes a little bit of work.
Play video starting at 9 minutes 13 seconds and follow transcript9:13
First you check to see if N has a right child. If it's right child is null then it turns out we're not in this
other case. But you can just remove N and you need to promote Ns left child, if it has one. So Ns left
child should now become the child of Ns parent instead of the other way around. Otherwise we're
going to let X be next of N and note that X does not have a left child.
Play video starting at 9 minutes 41 seconds and follow transcript9:41
And then we're going to replace N by X and promote Xs right child to sort of fill the gap that we made
by moving X out of the way.
Play video starting at 9 minutes 50 seconds and follow transcript9:50
But this all works. Just to review it, so if we have the following tree and we're deleting the highlighting
node. Which of the following three trees do we end up with? Well the answer here is C. The point is
that we deleted 1 so we want to replace it with the next element which is 2. So we took 2 and put it
to the place where 1 was. Now 2s child, 4, needs to be promoted. So 4 now becomes the new child.
Six and everything works nicely and in this tree.
Play video starting at 10 minutes 26 seconds and follow transcript10:26
Okay so if I tell you implement some basic operations for binary search trees next time we'll going to
talk about the run time of these operations, which are going to leads us to some interesting ideas
about the balance of these trees.
Balance
Hello, everybody, welcome back.
We're continuing to talk about binary search trees and today,
we're going to talk about balance.
Play video starting at 8 seconds and follow transcript0:08
In particular, we're actually going to look at sort of the basic runtime
of these operations that we talked about in the last lecture and from that,
we're going to notice that they'll sometimes be a little bit slow.
And to combat this, we want to make sure that our trees are balanced and
well, doing that's a bit tough and
we're going to talk a little bit about how we're going to do this with rotations.
Okay, so first off we've got this great operation, it can do local searches,
but how long do these operations take?
Play video starting at 39 seconds and follow transcript0:39
And maybe a key example is the Find operation.
So we'd like to find 5 in the following tree.
We compare it to 7, and it's less than 7, and it's bigger than 2, and
it's bigger than 4, and it's less than 6, and it's equal to 5, and we found it.
And you'll note that the amount of work that we did we sort of had to traverse
all the way down the tree from the root to the node that we're searching for and
we had to do a constant amount of work at every level.
So the number of operations that we had to perform was O of the Depth of the tree.
Now, just to sort of make sure we get this. So if we have the following tree, we could be searching for
different nodes A, B, C or D, which ones are faster to search with, avoiding which others? Well, A is
the fastest. It's up at the root. Then D has depth only three, B has depth four, and C has depth five,
and so A is faster than D is faster than B is faster than C probably.
Play video starting at 1 minute 37 seconds and follow transcript1:37
So the runtime is the depth of the node that you're looking for. But it's unfortunate, the depth can
actually be pretty bad. In this example, we have only ten nodes in the tree, but this 4 at the bottom
has depth 6. In fact, if you think about it, things could be even worse. A tree on n nodes could have
depth n if the nodes are just sort of strung out in some long chain.
Play video starting at 2 minutes 1 second and follow transcript2:01
And so this is maybe sort of a problem. That maybe our searches only work in O of n time were not
any better than any of these other data structures that didn't really work. On the other hand, even
though depth can be very bad, it can also be much smaller. This example has the same ten nodes in it,
but the depth maximum depth is only four. And so by rearranging your tree, maybe you can make the
depth a lot smaller.
Play video starting at 2 minutes 28 seconds and follow transcript2:28
And in particular, what you realize is well, in binary search the questions that we asked, in order for it
to be efficient, we wanted to guess the thing in the middle. Because then no matter which answer we
got, we cut our search space in two, and so what this means for a binary search tree is at any node
you're asking that one question. You want things on the left and the things on the right. Those two
subtrees should have approximately the same size.
Play video starting at 2 minutes 53 seconds and follow transcript2:53
And this is what we mean by balance. And if you're balanced, suppose that you're perfectly balanced,
everything is exactly the same size, then this is really good for us. Because it means that each subtree
has half the size of sort of the subtree of its parent.
Play video starting at 3 minutes 9 seconds and follow transcript3:09
And that means after you go down, logarithmically, many levels the subtrees have size one and you're
just done.
Play video starting at 3 minutes 16 seconds and follow transcript3:16
And so, if your tree is well balanced, operations should run in O(log(n)) time, which is really what we
want.
Play video starting at 3 minutes 24 seconds and follow transcript3:24
But there's a problem with this, that if you make insertions they can destroy your balance properties.
We start with this tree, it's perfectly well balanced and just has one node I guess but we insert two
and then we insert three and then we insert five and then we insert four and you'll note that suddenly
we've got a very, very unbalanced tree. But all we did were updates. So, somehow we need a way to
get around this. We need a way to do updates without unbalancing the tree.
Play video starting at 3 minutes 55 seconds and follow transcript3:55
And the basic idea for how we're going to do this, is we're going to want to have some mechanism by
which we can rearrange the trees in order to maintain balance.
Play video starting at 4 minutes 4 seconds and follow transcript4:04
And there's one problem with this, which is that however we rearrange the tree, we have to maintain
the sorting property. We have to make sure that it's still sorting correctly or none of our other
operations will work. And well, there's a key way to do this, and this is what's known as rotation.
Play video starting at 4 minutes 24 seconds and follow transcript4:24
The idea is you got two nodes, X and Y. We say X is Y's parent. And there's a way to switch them. So,
that instead Y is X's parent. And the like sort of sub-trees A, B, and C that hang off of X and Y. You
need to rearrange them a little bit to keep everything still sorted. But there is this sort of very local
rearrangement. You can go back and forth. And it keeps the sorting structure working. And it
rearranges the tree in some hopefully useful way.
Play video starting at 4 minutes 53 seconds and follow transcript4:53
So just to be clear about how this works, it takes a little bit of bookkeeping. But that's about it. You let
P be the parent to X. Y be its left child. B be its right child.
Play video starting at 5 minutes 3 seconds and follow transcript5:03
And then what we're going to do is we just need to reassign some pointers. P is the new parent of Y
and Y is its child, Y is the new parent of X and X is its right child, X is the new parent of B and B is X's
new left child. And once you've sort of rearranged all those pointers then everything actually works.
This is a nice constant time operation and it does some useful rearrangements.
Play video starting at 5 minutes 27 seconds and follow transcript5:27
So what we really need to do though is we need to have a way to sort of use these operations to
actually keep our tree balanced and we're going to start talking about how to do that next time when
we discuss AVL trees.
Slides
Download the slides for this lesson:
08_binary_search_trees_1_intro.pdfPDF File
08_binary_search_trees_2_binary_search_trees.pdf PDF File
08_binary_search_trees_3_basic_ops.pdf PDF File
08_binary_search_trees_4_balance.pdfPDF File
References
See the chapter 12 in [CLRS] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford
Stein. Introduction to Algorithms (3rd Edition). MIT Press and McGraw-Hill. 2009.
AVL Trees
Hello everybody, welcome back.
We're still talking about binary search trees but
today we're going to talk about AVL trees.
And AVL trees are just sort of a specific way of maintaining balance in your
binary search tree.
And we're just going to talk about sort of the basic idea.
Next lecture we're going to talk about how to actually implement them.
Play video starting at 20 seconds and follow transcript0:20
But okay, what's the idea?
We learned last lecture that in order for
our search operations to be fast, we need to maintain balance of the tree.
But before we can do that we first need a way to measure the balance of the tree so
that we can know if we're unbalanced and know how to fix it.
Play video starting at 38 seconds and follow transcript0:38
And a natural way to do this is by what's called the height of a node.
So if you have a node in the tree,
its height is the maximum length of a path from that node to a leaf of your tree.
Play video starting at 51 seconds and follow transcript0:51
Fair enough.
So for
example if we have the following tree, what's the height of the highlighted node?
And a natural way to do this is by what's called the height of a node.
So if you have a node in the tree,
its height is the maximum length of a path from that node to a leaf of your tree.
Play video starting at 51 seconds and follow transcript0:51
Fair enough.
So for
example if we have the following tree, what's the height of the highlighted node?
Play video starting at 59 seconds and follow transcript0:59
Well, this node has height six, the following path of length six leads down
from this node and it turns out to be nothing longer.
so, we can define height recursively in a very easy way. If you have leaf its height is one because
you're just there and you can't go any further.
Play video starting at 1 minute 18 seconds and follow transcript1:18
Otherwise, well the longest path downwards you can have, it's either through the longest path on
your left side or the longest path on your right side. So you want to take the maximum of the height
of your left child and the height of your right child, and then you need to add one to that because n
sort of gets added to this path.
Play video starting at 1 minute 37 seconds and follow transcript1:37
Okay, that's fine. Now, in order to actually make use of this height we actually are going to want to
add a new field to our nodes. So, the nodes that made up our tree previously stored a key and are
pointed to the parents on the left child and the right child, and now they also need to store another
piece of data, the height of node.
Play video starting at 1 minute 55 seconds and follow transcript1:55
And note that we are actually going to have to do some work, and we'll talk a little bit about how to
do this later, to insure that this height field is actually kept up to date. We can't just store it as a
number and leave it there forever. If we rearrange the tree, we might need to change its heights.
S In any case, back to balance. Height is a very rough measure of the size of a sub-tree.
Play video starting at 2 minutes 18 seconds and follow transcript2:18
For things to be balanced, we want the size of the two sub-trees of the left and right children of any
given node to be roughly the same.
Play video starting at 2 minutes 25 seconds and follow transcript2:25
And so there's an obvious way to do this. We'd like to force the heights of these children to be
roughly the same. So the AVL property is the following.
Play video starting at 2 minutes 35 seconds and follow transcript2:35
For all nodes N in our tray, we would like it to be the case that the difference between the height of
the left child and the height of the right child is at most one.
Play video starting at 2 minutes 45 seconds and follow transcript2:45
And we claim that if you can maintain this property for all nodes in your tree, this actually ensures
that your tree is reasonably well balanced.
Play video starting at 2 minutes 55 seconds and follow transcript2:55
3:19
Okay and so, really what we'd like to know is that if you have the AVL property on all nodes, then the
total height of the tree should be logarithmic. It should be O(log(n)). So basically what we want to say
is that you have an AVL tree and it doesn't have too many nodes. Then the height is not too big.
Play video starting at 3 minutes 15 seconds and follow transcript3:15
But it turns out that the easier way to get at this is to turn this on its head. We want to show instead
that if you have an AVL tree and the height isn't too big, then you can't have too many nodes.
Play video starting at 3 minutes 28 seconds and follow transcript3:28
And this we can do.
So we're going to prove the following theorem. Suppose that you have an AVL tree, a tree satisfying
the AVL property, and N is a node of this tree with height h.
Play video starting at 3 minutes 41 seconds and follow transcript3:41
THen the claim is that the sub-tree of N has to have size at least the Fibonacci Number F's og h. And
so just to review, we talked about Fibonacci Numbers way back in the introductory unit for this, but
for the previous course in this sequence. But this is just a sequence of numbers. The zeroth one is
zero, the first is one, and from there after each Fibonacci number is the sum of the previous two.
Now, these are a nice predictable sequence, they grow pretty fast, the end Fibonacci number's at
least two to the n over two for all n at least 6.
Play video starting at 4 minutes 17 seconds and follow transcript4:17
Okay so let's look at the proof, we're going to do this by induction on the height of our node.
Play video starting at 4 minutes 22 seconds and follow transcript4:22
If the node we're looking at has height one, it's a leaf. And it's sub-tree has one node which is the first
Fibonacci number, great.
Play video starting at 4 minutes 32 seconds and follow transcript4:32
Next we need an inductive hypothesis if you've got some node of height h. By sort of definition of the
height, at least one of your two children need to have height h minus one. Then, by the AVL property,
your other child needs to have height at least h minus two. So by the inductive hypothesis, the total
number of nodes in this tree is at least the sum of the h -1 Fibonacci number plus the h- 2 Fibonacci
number, which equals the h Fibonacci number. And so that completes the proof.
Play video starting at 5 minutes 6 seconds and follow transcript5:06
So what does this mean? It means that if a node in our tree has height h, the sub-tree of that node
has height at least two to the h over two. But if our tree only has n nodes, two to the h over two can't
be more than n. So the height can't be more than two log base two of n. Which is o of log n.
Play video starting at 5 minutes 28 seconds and follow transcript5:28
And so the conclusion is if we can maintain the AVL property, you can perform all of your find and
operations in such in O(log(n)) time. And so next lecture we're going to talk about how to maintain
this property but this is the key idea. If we can maintain this property, we have a balanced tree
and things should be fast. So I'll see you next time as we discuss how to ensure that this happens
Okay! But the key thing we still haven't really touched. We need to figure out how to do the
rebalancing. So you have a node, its left child is heavier than its right child. Its left child has exactly
two more height to it.
Play video starting at 4 minutes 39 seconds and follow transcript4:39
And the basic idea is the left child is bigger, it needs to be higher up, so we should just rotate
everything right.
Play video starting at 4 minutes 46 seconds and follow transcript4:46
And it turns out that in a lot of cases this is actually enough to solve the problem. There is one case
where it doesn't work. So B is the node we're trying to rebalance. A is its left child which is too heavy,
and we're going to assume that A is too heavy because its right child has some large height, n+1. The
problem is that if we just rotate B to the right, then this thing of height, n+1, switches sides of the tree
when we perform this rotation. Switches from being A's child to being B's child. And when we do this
we've switched our tree from being unbalanced at B to being unbalanced at A now, in the other
direction.
Play video starting at 5 minutes 28 seconds and follow transcript5:28
And so, just performing this one rotation doesn't help here.
Play video starting at 5 minutes 32 seconds and follow transcript5:32
In this case the problem is that A's right child, which we'll call X, was too heavy. So the first thing we
need to do is make X higher up. So what you can do is, instead of just doing this rotation at B, first we
rotate A to the left one, then we rotate B to the right one. And then you can do some case analysis
and you figure out after you do this you've actually fixed all the problems that you have.
Play video starting at 5 minutes 58 seconds and follow transcript5:58
And it's good.
Play video starting at 6 minutes 0 seconds and follow transcript6:00
The operation for rebalancing right is you let M be the left child of N and then you check to see if
we've to be in this other case. If M's right child has height more than M's left child, then you rotate M
to the left, and then no matter what you did, you rotate N to the right. And then no matter what you
did, all the nodes that you rearranged in this procedure, you need to adjust their heights to make sure
that everything works out. Once you do this, this rebalances things at that node properly, it sets all
the heights to what they should be, and it's good.
Play video starting at 6 minutes 36 seconds and follow transcript6:36
Okay, so that's how insert works. Next, we need to talk about delete. And the thing is deletions can
also change the balance of the tree. Remember generally what we do is the deletions we removed
the node. But, we replaced it by its successor and then promoted its successor's child.
Play video starting at 6 minutes 54 seconds and follow transcript6:54
And the thing to note is that when you do this, sort of the space in the tree where the successor was,
the height of that denoting that location decreased by one. Because instead of having successor and
then its child and then some such, you just have the child and such.
Play video starting at 7 minutes 9 seconds and follow transcript7:09
And this of course can cause your tree to become unbalanced even if it were balanced beforehand.
So, we of course need a way to fix this, but there's a simple solution. You delete the node N as before.
You then let M be this left child of the node that replaced N this thing that might have unbalanced the
tree. And then you run the same rebalance operation that we did for our insertions starting on M and
then filtering its way up the tree. And once you've done that, everything works.
Play video starting at 7 minutes 41 seconds and follow transcript7:41
And so what we've done is we've shown that you can maintain this AVL property and you can do it
pretty efficiently, all of our rebalancing work was only sort of O of 1 work per level of the tree. And so
if you can do all of this, we can actually perform all of our basic binary search tree operations in O of
log n time per operation, using AVL trees. And this is great. We really do have a good data structure
now for these local search problems.
Play video starting at 8 minutes 8 seconds and follow transcript8:08
So that's all for today, coming next lecture we are going to talk about a couple of other useful
operations that you can perform on binary surgeries.
Now if we didn't care about balance, that would be all we'd have to say about this operation.
Unfortunately, this merge operation doesn't preserve balance properties. And the problem is that,
well, the two trees, you didn't really touch them very much, so they stay balanced. But when you stick
them both together under the same root, well, if one tree is much, much bigger than the other,
suddenly the root is very, very unbalanced, and this is a problem.
Slides
Download the slides for this lesson:
08_binary_search_trees_5_avl.pdf
08_binary_search_trees_6_avl2.pdf
References
See the chapters 5.11.1, 5.11.2 here.
https://en.wikipedia.org/wiki/AVL_tree
See this visualization. Play with this AVL tree by adding and deleting elements to see how it
manages to keep being balanced.
Less
Applications
Video: LectureApplications
10 min
Resume
. Click to resume
10 min
Splay Trees
Video: LectureSplay Trees: Introduction
6 min
7 min
10 min
10 min
Programming Assignment 4
3 questions
3h
Applications
Hello everybody, welcome back. Today we're going to talk more about binary searches, in particular
we're going to give a couple of applications to computing order statistics. And then a sort of
additional use of binary search trees, to store assorted lists.
Play video starting at 21 seconds and follow transcript0:21
Okay, so, there's some questions that you might want to ask. You've got a bunch of elements that are
stored in this binary search tree data structure. They're assorted by some ordering. Things we might
want to do. We might want to find the 7th largest element. Or maybe we want the median element,
or the 25th percentile element. Now, these are all instances of an order statistic problem. We would
like to be able to be given the root of the T tree and the number k, we should be able to return the
kth smallest element stored in the T tree.
Play video starting at 53 seconds and follow transcript0:53
So, the basic idea is that this is sort of a search problem. We sort of should treat it like one. But to do
that we need to know which subtree to search in.
Play video starting at 1 minute 3 seconds and follow transcript1:03
So I mean, is the case smallest element in the left subtree? Well, the left subtree does store a bunch
of the smallest elements. But the real question is, does it store k of them? If it stores at least, k, k's
smallest element should be in there, otherwise it won't be. So the thing that we need to know, is how
many elements are in the left subtree?
PPT Slides
And this requires sort of a slightly different way of thinking about binary search trees.
Play video starting at 6 minutes 17 seconds and follow transcript6:17
Up until now we thought of a search tree as sort of having a bunch of elements there stored, and they
allow you to do searches on them. And so somehow you're given the keys, and the search tree allows
you to find them.
Play video starting at 6 minutes 29 seconds and follow transcript6:29
But there's another way to do it. And maybe a good way to illustrate it, is by looking at the logo for
our course. So this is a binary search tree. Every node has a letter in it. But you'll note that this isn't
sorted in terms of these letters. For example, O is the left child of I, but O comes after I in alphabetical
order. These things aren't sorted alphabetically.
Play video starting at 6 minutes 53 seconds and follow transcript6:53
On the other hand, you'll note that the binary search tree structure actually does tell you what order
these letters are supposed to be going in.
Play video starting at 7 minutes 1 second and follow transcript7:01
I mean, the smallest thing here should be A, because it's the left child, of the left child, of the left
child, of the left child.
Play video starting at 7 minutes 7 seconds and follow transcript7:07
Then L is the next smallest, then G, then O, R, I, then T, H, M, S. And so, it tells us what order these
layers are supposed to be, and they spell algorithms.
Play video starting at 7 minutes 20 seconds and follow transcript7:20
And that's the basic idea, that you can use a tree to store some sort of sorted list of things, in a
convenient way.
So, for example we have the following tree. There are no actual keys stored on them, but of this A, B,
C, D, and E, one of these is the 5th smallest element in the tree. Which one is it?
Play video starting at 7 minutes 42 seconds and follow transcript7:42
Well, it's D. I mean you sort of count the smallest sum left, then B, then this other thing, and D is the
5th smallest.
Play video starting at 7 minutes 50 seconds and follow transcript7:50
Okay, so what's the point? How are we going to use this to do our flip arrays? What we're going to do
is, instead of storing the sequence of black and white cells as an array, we're going to store it as a list.
And in this list we're going to have a bunch of nodes, they're going to be colored black or white, fine.
Play video starting at 8 minutes 9 seconds and follow transcript8:09
Actually, there's a bit of a clever thing. We're actually going to want two trees, one with the normal
colors, and one with the opposite colors. And the reason for this is that when we want to do flips, we
want to be able to replace things with their opposite colors. So, it helps to have everything opposited
already stored somewhere.
Play video starting at 8 minutes 28 seconds and follow transcript8:28
But now comes the really clever bit. If we wanted to do this flip operation, say we wanted to take the
last three elements of our tree and flip all of their colors, well this second tree, this dual tree, the last
three elements of that tree, have the opposite colors. So all that we need to do is swap the last three
elements of the tree on the left, the last three elements of the tree on the right, and we have
effectively swapped those colors.
Play video starting at 8 minutes 56 seconds and follow transcript8:56
And what's even better is that using these merge and split operations from last time, we can actually
do this.
Play video starting at 9 minutes 4 seconds and follow transcript9:04
So let's see how this is implemented.
Firstly, to create this thing, we just build two trees, where T1, all of the things are colored white, and
T2, they're all colored black. Great. To find the color of the mth node, you just find the mth node
within T1 and return its color.
Play video starting at 9 minutes 23 seconds and follow transcript9:23
Great. The flip operation is the interesting bit. If we wanted to Flip(x), what you do is, you split T1 at x
into two halves, and you split T2 at x, and then you merge them back together. But you merge the left
half of T1, with the right half of T2, and you merge the left half of T2, with the right half of T1. And
that effectively did move around the sort of last N bits, and it works. And so, as the moral, trees can
actually be used for more than just performing searches on things. We can use them to store these
sorted lists, and merge and split then become very interesting operations, in that they can allow us to
recombine these lists in useful ways. Okay, so that's all for these applications. Next time we're going
to sort of talk, we're going to give an optional lecture, and it's going to talk about an alternative way
to implement many of these useful binary search tree operations. So, please come to that, it'll be
interesting, but it's not really required. It's another way to do some of this stuff. But, I hope I'll see you
there.
Slides
Download the slides for this lesson:
References
See the chapters 14.1, 14.2 in [CLRS] Thomas H. Cormen, Charles E. Leiserson, Ronald L.
Rivest, Clifford Stein. Introduction to Algorithms (3rd Edition). MIT Press and McGraw-Hill. 2009.
Splay Tree
Splay Trees: Introduction
Ppt slides
Hello everybody. Welcome back. Today we are going to talk about something a little different.
Play video starting at 6 seconds and follow transcript0:06
Up until this point, we've talked about AVL trees, we've talked about how to keep them bound, and
how to use them, to implement all of our binary search tree operations, and all of log on time pre-
operations. But it turns out that there are a wide number of different binary search tree structures
that give you different ways to ensure that your trees are balanced, there are trepes, there are red
block trees, and today we're going to talk about splay trees as sort of another example of the types of
things that you can do.
Play video starting at 35 seconds and follow transcript0:35
And to motivate this suppose that, well, if you're searching for random elements, one after the other,
you can actually show that no matter what splay your use or data, searcher you use it will always take
at least log n time per operation. That's actually the best you can do.
Play video starting at 54 seconds and follow transcript0:54
However, if some items are searched for more frequently than others, you might be able to do better.
If you take the frequently queried items and put them close to the root, those items will be faster to
search for. And some of the other items might be a little bit slower, but you should still be okay.
Play video starting at 1 minute 13 seconds and follow transcript1:13
To compare for example, we've got the unbalanced tree and the balanced one with the same entires.
But you'll note that 1, 5 and 7 are much higher up in the unbalanced tree. Now if we search for
random things, if we search for 11, 11 is much higher up in the balanced tree than the unbalanced
one, so the unbalanced one is slower there. But when we search for 1, it takes a lot less time in the
unbalanced case, we search for one again, we search for seven, it's again a lot cheaper in the
unbalanced case. And if we do some sequence of searches well, it might turn out that's is actually
cheaper to use the unbalanced tree than the balanced one if these elements that tend to be higher up
in the unbalanced tree are searched for more frequently than other elements.
Ppt slides
Ppt slides
And then you actually just rotate the node up And so if we combine these operations, together we get
what's called the splay operation. If you want to splay a node N, and this is a way of bringing the node
N to the root of the tree, you determine which case you are in, the Zig-Zig, the Zig-Zag, or the Zig case.
You apply the appropriate local operation to rearrange the tree.
Play video starting at 5 minutes 24 seconds and follow transcript5:24
And then if the parent of the node is not null IE, if the node is not the root of the tree yet, you splay n
again. And you just keep splaying until it gets to become the root. Okay so to make sure that we're on
the same page with this.
If the take the tree up top and we splay the highlighted node number 4, which of these three trees, A,
B, or C, do we end up with afterwards?
Play video starting at 5 minutes 53 seconds and follow transcript5:53
Well, the answer, here, is A. So, the point is, we start in this configuration, we note that we're,
originally, in the zig, zig case, two, three, and then, four. So, we elevate four, such that three, and
then, two come down from it, as children. And then we're in the zig zag case. One and five are on
opposite sides of four so we elevate four, one and five are it's new children and three and two now
hang off of one and that is exactly what we were supposed to end up with and so that is the answer
to this question.
Play video starting at 6 minutes 27 seconds and follow transcript6:27
Okay, so that's what the splay operation is. Next time we're going to talk about how to use the splay
operation to rebalance your circuitry, and how to use it to perform all the basic binary circuitry
operations efficiently. So I'll see you then.
PPT Slides
Hello everybody. Welcome back. Today we're still talking about splay trees and we're actually going to
go into a little bit of the math behind analyzing their run times.
Play video starting at 10 seconds and follow transcript0:10
So remember last time we analyzed splay trees and in order to do so we needed the following
important result, that the amortized cost of doing O(D) work and then splaying a node of depth D is
actually O(log(n)) where n is the total number of nodes in the tree.
Play video starting at 28 seconds and follow transcript0:28
And today we're going to prove that.
Play video starting at 30 seconds and follow transcript0:30
So to do this of course we need to amortize, we need to pay for this extra work by doing something to
make the tree simpler. And the way we talk about this being simple is we're going to pick a potential
function, and so that if we do a lot of work it's going to pay for itself by decreasing this potential.
Play video starting at 49 seconds and follow transcript0:49
And it takes some cleverness to find the right one and it turns out more or less the right potential
function is the following. We define the rank of a node N to be the log of the size of it's subtree,
where the size of it's subtree is just the number of nodes that are descendants of N in that tree.
Play video starting at 1 minute 8 seconds and follow transcript1:08
Then the potential function for the tree is P is just the sum over all nodes N in the tree of the rank of
N. Now to get a feel for what this means if your tree is balanced, or even approximately balanced,
potential function should be approximately linear in the total number of the nodes. But if on the
other hand, it's incredibly unbalanced, say just one big chain of nodes, then the potential could be as
biggest N(log(n)). And so, a very large potential function means that your tree is very unbalanced. And
so, if you are decreasing the potential, it means that you're rebalancing the tree.
Play video starting at 1 minute 46 seconds and follow transcript1:46
So what we need to do is we need to see what happens when you perform a splay operation, what
does it do to the potential function.
Play video starting at 1 minute 54 seconds and follow transcript1:54
Now, to do that, the splay operation is composed of a bunch of these little operations, zig, zig zigs and
zig zags and we want to know for each operation what does that do to the potential.
Play video starting at 2 minutes 7 seconds and follow transcript2:07
So for example when you perform a zig operation how does the potential function change?
Play video starting at 2 minutes 14 seconds and follow transcript2:14
Well you'll note that other than N and P, these two nodes that were directly affected, none of the
nodes have their subtrees change at all. And therefore the ranks of all the other nodes stay exactly
the same.
Play video starting at 2 minutes 28 seconds and follow transcript2:28
So the change in potential function is just the new rank of N plus the new rank of P, minus the old
rank of N and the old rank of P.
Play video starting at 2 minutes 37 seconds and follow transcript2:37
Now, the new rank of N and the old rank of P are actually the same, because each of those had
subtrees that were just the entire tree.
Play video starting at 2 minutes 46 seconds and follow transcript2:46
And so this is just the new rank of P minus the old rank of N and it's easy to see that's at most, the
newer rank of N minus the old rank of N.
Play video starting at 2 minutes 55 seconds and follow transcript2:55
That's not so bad.
Play video starting at 2 minutes 57 seconds and follow transcript2:57
Now let's look at the zig-zig analysis which is a little bit trickier.
Play video starting at 3 minutes 1 second and follow transcript3:01
So here the change in the potential is the new rank of N plus the new rank of P plus the new rank of Q
minus the old rank of N, and the old rank of P and the old rank of Q. So the new ranks minus all the
old ranks. Now the claim here is that this is at most 3 times the new rank of N minus the old rank of N
minus 2. And to prove this we need a few observations.
Play video starting at 3 minutes 27 seconds and follow transcript3:27
The first thing is that the rank of N is equal to the old rank of Q and that this term is actually bigger
than any other term in our expression. And that's simply because, I mean, both of these nodes what
are their subtrees? Well, it's N, P and Q. And then the red, green, blue and black subtrees. They're the
same subtrees, the same size. They've got the same rate. But the next thing to note is that the old size
of N's subtree and the new size of Q's subtree, when you add them together, it's going to be the red
subtree, the green subtree, the blue subtree, and the black subtree plus two more nodes.
Play video starting at 4 minutes 7 seconds and follow transcript4:07
And that's actually one less than the size of either of these two big terms.
Play video starting at 4 minutes 14 seconds and follow transcript4:14
And what that says, when you take logarithms, you can actually get that the size, the rank of N, the
old rank of N plus the new rank of Q is at most twice the new rank of N minus 2.
Play video starting at 4 minutes 28 seconds and follow transcript4:28
Because they're sort of half the size each.
Play video starting at 4 minutes 31 seconds and follow transcript4:31
And therefore, if you combine these inequalities together, you can actually get the one that we
wanted on the previous slide. Now, the zig-zag analysis is pretty similar to this. Here, you can show
the change in potential is at most twice the new rank of N minus the old rank of N minus 2. Okay,
great. So now we perform an entire splay operation. So we splay once, and then again, and then
again, and then again, all the way up until we finally have the final version of N that's the root.
Play video starting at 5 minutes 3 seconds and follow transcript5:03
And we want to know what the total change in the potential function is over all of these little teeny
steps.
Play video starting at 5 minutes 10 seconds and follow transcript5:10
Well what is it? Well it's at most the sum of the changes in potentials from each steps. The last one
you have three times the final rank of N minus the rank of N one step before that, minus two.
Play video starting at 5 minutes 25 seconds and follow transcript5:25
You add to that the rank of N one step before the N minus the rank of N two steps before the N minus
2. Then you add three times the rank of N two steps before the end minus the rank three steps before
the N minus 2 and so on and so forth. And this sum actually telescopes. The rank of N one step before
the end there are two versions of it and they cancel. The rank two steps before the end, there are two
terms that cancel and so on and so forth. And the only terms that are left is well you've got three
times the rank of N at the very end of your splay operation, minus the rank of N at the very beginning
of your splay operation, and then for each of these steps, for each two levels the N went up the tree,
you have this copy of -2. So that's minus the depth of the node app.
Play video starting at 6 minutes 16 seconds and follow transcript6:16
And so the change in potential is just O of log(n) minus the depth, which is minus the work that you
did.
Play video starting at 6 minutes 23 seconds and follow transcript6:23
And note it's O of log(n) because the rank of n is at most log of the total number of nodes in the tree.
Play video starting at 6 minutes 30 seconds and follow transcript6:30
And so if you add the change in potential to the amount of work that you did, you get out O of log(n).
And so the amortized cost of your O of D work plus your splay operation is just O of log(n).
Play video starting at 6 minutes 46 seconds and follow transcript6:46
Now, that shows there our splay trees runs an O of log(n) amortized time per operation.
Play video starting at 6 minutes 52 seconds and follow transcript6:52
And if that was all you could say there is nothing really to be too excited about. I mean it gets about
the same run time,
Play video starting at 7 minutes 0 seconds and follow transcript7:00
maybe it's a little bit easier to implement. It's a little bit more annoying because it's only amortized
rather than worst case. Some operations will be much more expensive than log(n) even if on average
the operations are pretty cheap.
Play video starting at 7 minutes 15 seconds and follow transcript7:15
But another great thing is that splay tree has also have many other wonderful properties.
Play video starting at 7 minutes 22 seconds and follow transcript7:22
For example, if you assign weights to the nodes in any way such that the sum of all nodes of their
weight is equal to one, the amortized cost of accessing a node is actually just O(log(1/wt (N))). And
that means that if you spend a lot of time accessing high weight nodes it might actually be much
quicker that log(N) time per operation.
Play video starting at 7 minutes 46 seconds and follow transcript7:46
And so, and note that this run time bound holds no matter what weights you assign. You don't need
to change the algorithm based on the weights. This bound happens automatically. And so if there are
certain nodes that get accessed much more frequently than others, you could just sort of artificially
assign them very high weights and then that actually means that your display tree automatically runs
faster than log(n) per operation. Another bound is the dynamic finger bound. The amortized cost of
accessing a node is O(log(D + 1)) where here, D is the distance in terms of the ordering between
nodes between the last access and the current access. So if say you want to list all the nodes in order
or search for all the nodes in order, it's actually pretty fast to do a display tree because D is 1. It's a
constant per operation rather than O of log(n).
Play video starting at 8 minutes 43 seconds and follow transcript8:43
Another bound is the working set bound. The amortized cost of accessing a node N is O(log(t+1))
where t is the amount of time that has elapsed since that node N was last accessed.
Play video starting at 8 minutes 56 seconds and follow transcript8:56
And what that means, for example, is that if you tend to access nodes that you've accessed recently a
lot. So you access one node pretty frequently and then they move to accessing a different node pretty
frequently, then this actually does a lot better again than O of log(n) per operation.
Play video starting at 9 minutes 19 seconds and follow transcript9:19
Finally we've got what's known as the dynamic optimality conjecture.
Play video starting at 9 minutes 24 seconds and follow transcript9:24
And this says if you give me any list of splay tree operations, inserts, finds, deletes whatever.
Play video starting at 9 minutes 32 seconds and follow transcript9:32
And then you build the best possible dynamic search tree for that particular sequence of operations.
You can have it completely optimized to perform those operations as best possible.
Play video starting at 9 minutes 46 seconds and follow transcript9:46
The conjecture says that if you run a splay tree on those operations it does worse by at most a
constant factor.
Play video starting at 9 minutes 53 seconds and follow transcript9:53
And that's pretty amazing. It would say that if there is any binary search three that does particularly
well on a sequence of operations than at least conjecturally a splay tree does. So the conclusion here
is that splay trees, they're pretty fast, they require only O of log(n) amortized time per operation
which, remember, it can be a problem if you're worried that the occasional operation might take a
long time.
Play video starting at 10 minutes 18 seconds and follow transcript10:18
But in addition to this, splay trees can actually be much better if your input queries have extra
structure, if you call some nodes more frequently than others, or you tend to call nearby nodes to the
ones that you most recently accessed, and things like that. But that's actually it for today. That's a
splay tree, that's why they're considered to be useful.
Play video starting at 10 minutes 42 seconds and follow transcript10:42
And that's this course, I really hope that you've enjoyed it, I hope you'll come back for our next course
and best of luck.
Slides and External References
Slides
Download the slides for this lesson:
References
See the chapter 5.11.6 here.
Also see this visualization. Play with it by adding and erasing keys from it, and see how it can be
unbalanced, in contrast with AVL tree, but pulls the keys it works with to the top.
Splay Trees
Submit your assignment
Programming Assignment: Programming Assignment 4:
Binary Search Trees
You have not submitted. You must earn 3/5 points to pass.