Академический Документы
Профессиональный Документы
Культура Документы
• Hash Tables
– The Modulus Operator
– Closed hash tables
– Open hash tables
– Hash table efficiency and “load factor”
– Hashing non-numeric values
– unordered_map: A hash-based STL
map class
• (Database) Tables
Big-OH Craziness
2
Name: Rick
Classes:
What is the big-oh to
Left Right Class: CS31
Next: nullptr determine if a student has
Name: Sal
taken a class?
Name: Linda Classes:
Classes: Left Right
Left Right
nullptr nullptr
nullptr nullptr
Class: Math31 bool HasTakenClass(
BTree &b,
Class: CS31 Next:
Next:
Class: EE10
Class: Math31
Next: nullptr
string &name,
Next: string &class
Class: EE100
Next: nullptr
)
Hash Tables
3
SHOULD HAVE
USED A HASH
TABLE BUT USED
A LINKED LIST
INSTEAD JAR
Hash Tables
Why should you care?
So pay attention!
The Modulus Operator
5
that interesting
Let’s modulus-divide a bunch of numbers
fact away in your
by 5 and see what the results are!
brain for later…
What do you notice?
0%5=0
1%5=1 When we divide numbers by 5, all of the
remainders are less than 5 (between 0-4)!
2%5=2
3%5=3 Let’s try again with 3 for fun!
4%5=4 When we divide numbers by 3, all of the
5%5=0 remainders are less than 3 (between 0-2)!
6%5=1 And as you’d guess, if you divided a bunch of
7%5=2 numbers by 100,000, the remainders would all
8%5=3 be less than 100,000 (between 0-99,999)!
9%5=4 Rule: When you divide by a given value
10 % 5 = 0 N, all of your remainders are guaranteed
11 % 5 = 1 to be between 0 and N-1!
The “Hash Table”
7
Challenge:
Build an ADT that holds a bunch of 9-digit student ID#s
such that the user can add new ID#s or
determine if the ADT holds an existing ID#
in just 1 step – not O(N) or O(log2N) but O(1).
8
024,641,083
f(x)
TRUE
… TRUE
605,172,432
… 99,999
TRUE
999,999,999
The (Almost) Hash Table
12
private: ...
int hashFunc(int idNum)
{ [5223]
return idNum % 100000; [5224]
} The true value in [5225]
bool m_array[100000]; so big!slot 83,948
...
// not
}; indicates that the
int main() value 400,683,948 [83947]
{ is held in our ADT.
[83948] true
AlmostHashTable2 x;
x.addItem(400683948); [83949]
x.addItem(111105224); ...
x.addItem(222205224);
}
83948
15
The (Almost) Hash Table
class AlmostHashTable2 Let’s see how it works.
{
111,105,224 % 100,000 =
public:
void addItem(int n) 5,224
m_array[0]
{
int bucket = hashFunc(n);
m_array[bucket] = true; The true value in slot [1]
}
5,224 indicates that
private: the value 111,105,224 ...
int hashFunc(int idNum) is held in our ADT. [5223]
{
return idNum % 100000; [5224] true
} [5225]
bool m_array[100000];
...
// not so big!
};
A collision is a
condition where two or array[0]
more values both [1]
“hash” to the same ...
111,105,224
bucket in the array.
[5223]
f(x) [5224] true
This causes ambiguity, [5225]
and we can’t tell what
value was actually 222,205,224 ...
stored in the array! [83947]
[83948]
Let’s see how to fix [83949]
this problem! ...
18
X
19
...
If the target bucket is empty, we
can store our value there. 111,105,224 f(x) [5223]
[5224] 111,105,224
[5225] 222,205,224
However, instead of storing true in 222,205,224 f(x)
the bucket, we store our full original ...
value – this prevents ambiguity!
[99997]
[99998]
If the bucket is occupied, scan down
from that bucket until we hit the first [99999]
open bucket. Put the new value there.
20
[99999] 100,399,999
21
array[0]
This approach addresses
[1]
collisions by putting each
value as close as possible to ...
its intended bucket. [5223]
[5224] 111,105,224
[5225]
Since we store every original 222,205,224
struct BUCKET
{
// a bucket stores a value (e.g. an ID#) If this field is false, it means
int idNum; that this Bucket in the array is
empty. If the field is true,
bool used; // is bucket in-use?
};
then it means this Bucket is
already filled with valid data.
25 #define NUM_BUCK 10
class HashTable First we compute the
Linear Probing:
Since our array has 10 slots, we
will loop up to 10 times looking
{
public:
starting bucket number. Inserting
for an empty space. If we don’t
find an empty space after 10
tries, our table is full!
void insert(int idNum)
{ We’ll store our new item in
int bucket = hashFunc(idNum); the first unused bucket that
we find, starting with the
for (int tries=0;tries<NUM_BUCK;tries++) bucket selected by our hash
{ function.
if (m_buckets[bucket].used == false) If the current bucket is
{ already occupied by an item,
m_buckets[bucket].idNum = idNum; advance to the next bucket
m_buckets[bucket].used = true; (wrapping around from slot 9
back to slot 0 when we hit the
return; end).
}
bucket = (bucket + 1) % NUM_BUCK;
}
// no room left in hash table!!! Here’s our hash function.
} As before, we compute our bucket
number by dividing the ID number by
private: the total # of buckets and then taking
int hashFunc(int idNum) const the remainder (%).
{ return idNum % NUM_BUCK; }
Our hash table has 10 slots, aka
BUCKET m_buckets[NUM_BUCK]; “buckets.”
};
26 #define NUM_BUCK 10 bucket = 29% NUM_BUCK
class HashTable bucket = 29 % 10 Linear Probing:
{ bucket = 9 Inserting
public: 29
void insert(int idNum) bucket 9 0 idNum: used: f
{
1 idNum: used: f
int bucket = hashFunc(idNum); idNum: used:
2 f
3 idNum: used: f
for (int tries=0;tries<NUM_BUCK;tries++) 4 idNum: used: f
{
5 idNum: used: f
if (m_buckets[bucket].used == false)
{
6 idNum: used: f
m_buckets[bucket].idNum = idNum; 7 idNum: used: f
m_buckets[bucket].used = true; 8 idNum: used: f
9 idNum: 29 used: f
return;
}
bucket = (bucket + 1) % NUM_BUCK; main()
}
// no room left in hash table!!!
{
} HashTable ht;
private:
ht.insert(29);
int hashFunc(int idNum) const
{ return idNum % NUM_BUCK; }
ht.insert(65);
ht.insert(79);
BUCKET m_buckets[NUM_BUCK];
}; }
27 #define NUM_BUCK 10 10 bucket = 65 % NUM_BUCK
class HashTable bucket = 65 % 10
Linear Probing:
{
bucket = 5 Inserting
public: 65
void insert(int idNum) bucket 5 0 idNum: used: f
{
1 idNum: used: f
int bucket = hashFunc(idNum); idNum: used: f
2
3 idNum: used: f
for (int tries=0;tries<NUM_BUCK;tries++) 4 idNum: used: f
{
5 idNum: 65 used: f
if (m_buckets[bucket].used == false)
{
6 idNum: used: f
m_buckets[bucket].idNum = idNum; 7 idNum: used: f
m_buckets[bucket].used = true; 8 idNum: used: f
9 idNum: 29 used: T
return;
}
bucket = (bucket + 1) % NUM_BUCK; main()
}
// no room left in hash table!!!
{
} HashTable ht;
private:
ht.insert(29);
int hashFunc(int idNum) const
{ return idNum % NUM_BUCK; }
ht.insert(65);
ht.insert(79);
BUCKET m_buckets[NUM_BUCK];
}; }
28 #define NUM_BUCK 10 10 bucket = 79 % NUM_BUCK
class HashTable bucket = 79 % 10 Linear Probing:
{ bucket = 9 Inserting
public: 79
void insert(int idNum) bucket 0
9 0 idNum: 79 used: f
{
1 idNum: used: f
int bucket = hashFunc(idNum); idNum: used: f
2
3 idNum: used: f
for (int tries=0;tries<NUM_BUCK;tries++) 4 idNum: used: f
{
5 idNum: 65 used: T
if (m_buckets[bucket].used == false)
{
6 idNum: used: f
m_buckets[bucket].idNum = idNum; 7 idNum: used: f
m_buckets[bucket].used = true; 8 idNum: used: f
9 idNum: 29 used: T
return;
}
bucket = (bucket + 1) % NUM_BUCK;
NUM_BUCKETS; main()
}
// no room left in hash table!!!
{
} HashTable ht;
private:
ht.insert(29);
int hashFunc(int idNum) const
{ return idNum % NUM_BUCK;
NUM_BUCKETS;
} }
ht.insert(65);
ht.insert(79);
BUCKET m_buckets[NUM_BUCK];
m_buckets[NUM_BUCKETS];
}; }
29 #define NUM_BUCK 10 Since we may have
class HashTable
Compute the starting
bucket where we expect Linear Probing:
collisions, in the worst
case, we may need to
{
public:
to find our item.
Searching
check the entire table!
(10 slots)
0 idNum: 79 used: T
1 idNum: used: f
2 idNum: used: f
So far, we’ve seen how to insert items
into our Linear Probe hash table. 3 idNum: used: f
4 idNum: used: f
What if we want to delete a value from our hash 5 idNum: 65 used: T f
table? 6 idNum: 15 used: T
So, in summary,-1 only use
7 idNum: 175 used: T
Let’s take a naïve approach and see what Closed/Linear Probing hash
8 idNum: used: f
happens… tables when you don’t
9 idNum: 29 used:
intend to delete items T
To delete the value, let’s just zero out our value and set the from your hash table.
used field to false...
Our closed hash table + linear probing works just fine, but
it still has a few problems:
It’s difficult to delete items
It has a cap on the number
of items it can hold… That’s a bummer.
To insert
searchafor
newanitem:
item:
array of
1. As before, compute a bucket pointers
# with your hash function: 0 nullptr
ID: 11
nullptr
ID: 1
nullptr
bucket = hashFunc(idNum); 1 nullptr
2 nullptr
thenew
linked listtoatthe ID: 3
nullptr
2.
2. Search
Add your value 3 nullptr ID: 101
nullptr
array[bucket] for your item
linked list at array[bucket]. 4 nullptr
ID: 25
nullptr
5 nullptr
3.
3. If we reach the end of the
DONE! 6 nullptr
list without finding our item, 7 nullptr Cool! Since the linked
it’s not in the table! 8 nullptr list in each bucket can
hold an unlimited
9 nullptr numbers of values… Our
open hash table is not
size-limited like our
Insert the following values: 1, 3, 11, 25, 101 closed one!
37
X
Answer: 0 nullptr
ID: 11
ID: 1
NULL
NULL
You just remove the value 1
from the linked list. 2 nullptr
ID: 3
nullptr
3 ID: 101
nullptr
Let’s delete the student with 4 nullptr
ID: 25
nullptr
ID=11 and see what happens… 5
6 nullptr
Cool! Unlike a closed hash 7 you
If nullptr
plan to repeatedly insert and
table, you can easily delete 8 values into the hash table, then
delete
items from an open hash table! 9the Open table is your best bet!
Also, you can insert more than N items into
your table and still have great performance!
Hash Table Efficiency
38
Answer:
It depends upon:
Answer:
It depends upon:
idNum: -1 GPA:
0 Name: etc…
Let’s assume we have a completely idNum: -1 GPA:
(or nearly) empty hash table… 1 Name: etc…
idNum: -1 GPA:
What’s the maximum number of steps 2 Name: etc…
required to insert a new value ? idNum: -1 GPA:
3 Name: etc…
Right! There’s zero chance of idNum: -1 GPA:
collision, so we can add our new value 4 Name: etc…
in one step! 5
idNum: -1 GPA:
Name: etc…
And finding an item in a nearly-empty 6
idNum: -1 GPA:
Name: etc…
hash table is just as fast!
idNum: -1 GPA:
7 Name: etc…
We have no collisions so either we idNum: -1 GPA:
find an item right away or we know it’s 8 Name: etc…
not in the hash table… 9
idNum: -1 GPA:
Name: etc…
Hash Table Efficiency
41
Given a particular load L for an Open Hash Table, it’s also easy
to compute the average # of tries to insert/find an item:
Challenge:
If you want to store up to 1000 items in an Open Hash Table and be able
to find any item in roughly 1.25 searches, how many buckets must your
hash table have?
Remember: Expected # of Checks = 1 + L/2
This result means:
If our hash table has 2000
“If you want to be able to find/insert items
buckets and we’re inserting a
into your open hash table in an average of 1.25
maximum of 1000 values, we are
steps, you need a load of .5, or roughly 2x
guaranteed to have an average of
more buckets than the maximum number of
1.25 steps per insert/search!
values you’ll put into your table.”
Answer:
Part 1: Set the equation above equal to 1.25 and solve for L:
1.25 = .25 = L/2 .5 = L
Part 2: Use the load formula to solve for “Required size”:
# of items to insert
______1000________
L= .5 = Required hash table size = 1000
Required hash table size Required hash table size .5
= 2000
buckets
So basically it’s a tradeoff!
47
return(total);
}
More complex to
Simplicity Easy to implement
implement
So pay attention!
“Tables”
57
struct Student
Heck, why not just create a {
whole C++ class for our table? string name;
class TableOfStudents int IDNum;
{ float GPA;
public: string phone;
TableOfStudents(); // construct a new table …
~TableOfStudents(); // destruct our table };
void addStudent(Student &stud); // add a new Student
Student getStudent(int s); // retrieve Students from slot s
int searchByName(string &name); // name is a searchable field
int searchByPhone(int phone); // phone is a searchable field
…
private:
vector<Student> m_students;
};
Tables
61
Name: Carey
ID #: 400683945
GPA: 4.0
Phone: 424 750-7519
But now we have two copies of every record, one in each tree!
If the records are big, that’s a waste of space!