Data Structures AND Algorithms: Lecture Notes 11

Bilgisayar Mühendisliği Bölümü
DATA STRUCTURES
AND
ALGORITHMS
Lecture Notes 11
Sets and Maps
Spring 2008
GIT – Computer Engineering Department

Chapter Outline
The Set and Map containers and how to use them
Hash coding and its use in efficient search &
retrieval
Two forms of hash tables:
– Open addressing
– Chaining
– Their relative benefits and performance tradeoffs
Implementing both hash table forms
Introduction to implementation of Maps and Sets
Applying Maps and Sets to previous problems
GIT – Computer Engineering Department 2

Standard Library Containers

Associative Containers

The Set Abstraction
A set is a collection containing no duplicate
elements
Operations on sets include:
– Testing for membership
– Adding elements
– Removing elements
– Union
– Intersection
– Difference
– Subset
Set Implementation
set is a template class in standard library
with template parameters
– Key_Type : type of the items
– Compare : function class to order items
Comparison is not required for set abstraction
– Compare has default value

The set class
Function Behavior
template<typename II> Construct a set from the sequence of objects
set(II first, II last) represented by the iterator range first ..
last.
iterator begin() Returns an iterator to the first item in the
const iterator begin() const set.
iterator end() Returns an iterator to one past the last item
const iterator end() const in the set.
bool empty() Returns true if the set is empty.
size_t size() Returns the number of item in the set.
pair<iterator, bool> insert(const Inserts an item into the set. If the item is not
Key_Type& item) in the set, the iterator will reference the
inserted item and the bool will be true. If
the item is already in the set the iterator
will reference the item currently in the set
and the bool will be false.
template<typename II> void Inserts the items from the range first ..
insert(II first, II last) last into the set. Duplicates are ignored.
void erase(const Key_Type& item) Removes the item from the set.
iterator find(const Key_Type& Returns an iterator to the item in the set. If

Implementing the Union Operator
/** Construct the union of two sets. */
template <typename Key_Type, typename Compare>
std::set<Key_Type, Compare> operator+(
const std::set<Key_Type, Compare>& left,
const std::set<Key_Type, Compare>& right){
typename std::set<Key_Type, Compare> result(left);
result.insert(right.begin(), right.end());
return result;
}

Implementing the Set Difference
/** Construct the difference of two sets. */
std::set<Key_Type, Compare> operator-(
const std::set<Key_Type, Compare>& left,
const std::set<Key_Type, Compare>& right) {
typename std::set<Key_Type, Compare> result(left);
for (typename std::set<Key_Type,
Compare>::const_iterator itr = right.begin();
itr != right.end(); ++itr)
result.erase(*itr);
return result;
}

Implementing Intersection
/** Construct the intersection of two sets */
std::set<Key_Type, Compare> operator*(const
std::set<Key_Type, Compare>& left,
const std::set<Key_Type, Compare>& right)
{
return left - (left - right);
}

Overloaded ostream extraction operator
// Overloading the ostream insertion operator
template<typename Item_Type>
std::ostream& operator<<(std::ostream& out, const
std::set<Item_Type>& a_set) {
out << "{";
bool first = true;
for (typename std::set<Item_Type>::const_iterator itr =
a_set.begin(); itr != a_set.end(); ++itr) {
if (first)
out << *itr;
else
out << ", " << *itr;
first = false;
}
return out << "}";
}

Set Functions in <algorithm>
Function Behavior
template<typename II1, typename Forms the set difference of the elements
II2, typename OI, typename Compare> from the range first1 .. last1 and
OI set_difference(II1 first1, II1 first2 .. last2. The result is stored
last1, II2 first2, II2 last2, OI in the sequence beginning with result.
result, Compare less)
The end of the result sequence is
returned.
template<typename II1, typename Forms the set intersection of the
II2, typename OI, typename Compare> elements from the range first1 ..
OI set_intersection(II1 first1, II1 last1 and first2 .. last2. The
last1, II2 first2, II2 last2, OI result is stored in the sequence
result, Compare less)
beginning with result. The end of the
result sequence is returned.
template<typename II1, typename Forms the set union of the elements
II2, typename OI, typename Compare> from the range first1 .. last1 and
OI set_union(II1 first1, II1 last1, first2 .. last2. The result is stored
II2 first2, II2 last2, OI result, in the sequence beginning with result.
Compare less)
The end of the result sequence is
returned.

Set Example
int main()
{
set<string> set1;
set<string> set2;
set<string> set_u;
set<string> set_d;
set<string> set_i;
string data1[] = {"Apples", "Oranges", "Pineapples"};

string data2[] = {"Peaches", "Apples", "Grapes"};
set1.insert(data1, data1+3);
set2.insert(data2, data2+3);
cout << "set1 is " << set1 << endl;
cout << "set2 is " << set2 << endl;
set_union(set1.begin(), set1.end(),
set2.begin(), set2.end(),
inserter(set_u, set_u.begin()));
cout << "set1 + set2 is " << set_u << endl;

Set Example (2)
set_difference(set1.begin(), set1.end(),
inserter(set_d, set_d. begin()));
cout << "set1 - set2 is " << set_d << endl;
set_intersection(set1.begin(), set1.end(),
inserter(set_i, set_i. begin()));
cout << "set1 * set2 is " << set_i << endl;
bool is_member = (set1.find(string("Apples")) != set1.end());

cout << "\"Apples\" is an element of set1 is "
<< boolalpha << is_member << endl;
return 0;
}

vector vs. set
Sets allow no duplicates
Sets do not have positions, so no
operator[]
Set iterator can produces elements in sorted
order
How to implement a set???

The multiset
The multiset is the same as the set except
that duplicate entries are allowed.
The insert always inserts a new item.
The erase removes all occurrences of an
item.
The functions lower_bound and upper_bound
may be used to select a range from a set.
– In a multiset these functions can also return all
occurrences of a given value
How to implement a multiset??

Example of a multiset
int count_occurences(const multiset<string>& words_set,
const string& target) {
multiset<string>::const_iterator first_itr =
words_set.lower_bound(target);
cout << "*first_itr == " << (first_itr !=
words_set.end() ? *first_itr : "end()") << endl;
multiset<string>::const_iterator last_itr =
words_set.upper_bound(target);
cout << "*last_itr == " << (last_itr !=
words_set.end() ? *last_itr : "end()") << endl;
int count = 0;
for (multiset<string>::const_iterator itr = first_itr;
itr != last_itr; ++itr)
++count;
return count;
}

The std::pair
The std::pair is defined in <utility>
It is a struct (data members are public) that
contains two data items: first and second.
There is a template function make_pair that
can be used to construct pair objects.

Definition of pair
template<typename Type1, typename Type2>
struct pair {
Type1 first;
Type2 second;
pair(const Type1& x, const Type2& y): first(x),
second(y) {}
pair() : first(Type1()), second(Type2()) {}
template<typename Other_T1, typename Other_T2>
pair(const pair<Other_T1, Other_T2>& other) {
first = other.first; second = other.second;
}
};
template<typename Type1, typename Type2>
pair<Type1, Type2> make_pair(const Type1& x, const
Type2& y) {return pair<Type1, Type2>(x, y);}

Maps and the Map Interface
Map is related to Set: it is a set of ordered pairs
Ordered pair: (key, value)
– In a given map, there are no duplicate keys
– Values may appear more than once
Can think of key as “mapping to” a particular value
Maps support efficient organization of information in
tables
Mathematically, these maps are:
– Many-to-one (not necessarily one-to-one)
– Onto (every value in the map has a key)

Map Picture

The map functions
Template parameters
– Key_Type: The type of the keys
– Value_Type: The type of the values
– Compare: The function class that compares the
keys.
All functions defined for the set are defined for
the map, taking a pair
– pair<Key_Type, Value_Type>
If set is defined map can be defined easily
Except the index operator: the index operator
(operator[]) is also defined for the map
The map operator[]
The map can be used like an array, except
that the Key_Type is the index.
– Example:
map<string, string> a_map;
a_map["J"] = "Jane";
a_map["B"] = "Bill";
a_map["S"] = "Sam";
a_map["B1"] = "Bob";
a_map["B2"] = "Bill";
Note:
– If a mapping exists, assignment will replace it.
– If a mapping does not exist, a reference will
create one with a default value
EX: Using a map to build an index
Index the words in a text with their line
numbers
– Use a map (map<string, list<int>>)
– Each word (string) is a key
– List of line numbers (list<int>) is a value

Using a map to build an index
typedef map<string, list<int> > map_type;
void build_index(istream& in, map_type& index) {

string next_line; // Each data line
int line_num = 0; // Line number
// Keep reading lines until done
while (getline(in, next_line)) {
line_num++;
String_Tokenizer tokenizer(next_line, " ,.:-!?/%\'\"");
// Insert each token in the index
while (tokenizer.has_more_tokens()) {
string word = tokenizer.next_token();
to_lower(word);
index[word].push_back(line_num);
}
}
}

Maps
How to implement Maps?
– What is the efficiency?

Chapter Outline
The Set and Map containers and how to use them
Hash coding and its use in efficient search &
retrieval
Two forms of hash tables:
– Open addressing
– Chaining
– Their relative benefits and performance tradeoffs
Implementing both hash table forms
Introduction to implementation of Maps and Sets
Applying Maps and Sets to previous problems

Hash Tables
Goal: access item given its key (not its position)
Therefore, want to locate it directly from the key
In other words, we wish to avoid much searching
Hash tables provide this capability

– Constant time in the average case! O(1)
– Linear time in the worst case O(n)
Searching an array: O(n) Searching BST: O(log n)

Hash Codes
Suppose we have a table of size N
A hash code is a number in the range 0 to N-1
We compute the hash code from the key
– a “default position” when inserting
– a “position hint” when looking up
A hash function is a way of computing a hash code
Desire: The set of keys should spread evenly over

the N values
When two keys have the same hash code: collision

EX: Character Frequencies in a Text
To build a Huffman tree, we need the number of
occurrences of each character
Hash key: character
Hash value: character frequency
Assume text includes ASCII characters
– There are 128 characters in ASCII
Hash table size is 128
– Character ASCII value is the index in the table
• HashTable[65] is frequency of character A
Hash function: returns the character value
What if the characters are Unicode

EX: Character Frequencies in a Text
There are 216 possible characters, but ...
– Maybe only 100 or so occur in a text
Choosing table size of 216 results in waste of space
Approach: hash characters to range 0-199
– That is, use a hash table of size 200
A possible hash function for this example:
int hash = uni_char % 200;
Collisions are certainly possible (see later)

– Mapping 216 possible characters to 200 indices

EX: An ideal hash table
Each key is mapped to a
different index !
– Not always possible
• many keys, finite indices
Even distribution
Considerations :
Devising a hash function
Decide on table size
Decide what to do when
collision

Devising Hash Functions
Simple functions often produce many
collisions
– ... but complex functions may not be good either!
It is often an empirical process
If keys are integers
– hash function : key % table.size()
• Ex: TableSize = 10
Keys = 120, 330, 1000
– TableSize should be prime
• Provides even distribution

For strings:
Adding letter values in a string
– same hash for strings with same letters in different order
– If TableSize is large and number of characters is small
• TableSize = 10000 & number of characters in a key = 8
127*8=1016 < 10000
Use all characters
∑ 32i Key [KeySize -i -1 ]
– Early characters does not count
• Use only some number of characters
• Use characters in odd spaces
Use first three characters
729*key[2] + 27*key[1] + key[0]
– If the keys are not random some part of the table is not used

For strings:
Better approach:
size_t hash = 0;
for (size_t i = 0; i < s.size(); ++i)
hash = hash * 31 + s[i];
hash = has % table.size()
This is the hash function used by Java in its String

class

The String hash is good in that:
– Every letter affects the value
– The order of the letters affects the value
– Hash function distributes the hash values well
over the integers
• Probability of two strings having the same hash value is
quite small
• The hash function can be assumed to produce random
hash values
The hash value may not be unique
– Too many possible strings
Collision is still possible
Guidelines for good hash functions:
– Spread values evenly: as if “random”
– Cheap to compute
Generally, number of possible values much

greater than table size

Collision Problem
Will consider two ways to organize hash
tables
– Open addressing
– Chaining

Open Addressing
Hashed items are in a single array
Hash code gives position “hint”
Handle collisions by checking multiple positions
Each check is called a probe of the table
If collision try an alternate cell

h0(x), h1(x), h2(x), …
hi(x) = (hash(x) + F(i)) % table.size()

F(0) = 0

Linear Probing
Probe by incrementing the index
– F(i) = i
If “fall off end”, wrap around to the beginning
– Take care, not to cycle forever!
1. Compute
index as hash_fcn() % table.size()
2. if table[index] == NULL, item is not in the table
3. if table[index] matches item, found item (done)
4. Increment index circularly and go to 2
Why must we probe repeatedly?
– hashCode may produce collisions
– remainder by table.size may produce collisions
Search Termination
Ways to obtain proper termination
– Stop when you come back to your starting point
– Stop after probing N slots, where N is table size
– Stop when you reach the bottom the second time
– Ensure table never full
• Reallocate when occupancy exceeds threshold

Hash Table Example
Table of strings, initial size 5
Add “Tom”, hash 84274 4 Slot 4
Add “Dick”, hash 2129869 4 Slot 0 (wraps)
Add “Harry”, hash 69496448 3 Slot 3
Add “Sam”, hash 82879 4 Slot 1 (wraps)
Add “Pete”, hash 2484038 3 Slot 2 (wraps)
Note: many lookups will probe a lot!

Size 11 gives these slots: 3, 5, 10, 56, 7

Hash Table Example
Insert keys
{89, 18, 49, 58, 69}
When 49 is
inserted
collision
occurs
– Put into the
next
available
index 0
58 collides with
18, 89, and 49
– Inserted at
index 1

Hash Table Considerations
Cannot traverse a hash table
– Order of stored values is arbitrary
– Can use an iterator to produce in arbitrary order
When item is deleted, cannot just set its entry to null
– Doing so would break probing
– Must store a “dummy value” instead
– Deleted items waste space and reduce efficiency
• Need to go through dummy values during search and even during
insertion
– Operations take O(n) time in the worst case
Higher occupancy causes more collisions

Reducing Collisions By Growing
Choose a new larger size, e.g., doubling
Similar to reallocating a vector
– But, elements can move around in reinsertion
• Simple copy is not possible
(Re)insert non-deleted items into new array

Install the new array and drop the old
Rehashing distributes items at least as well

Deleted items (dummy values) are not inserted

Rehashing Example
If 23 is inserted, the
table is over 70 percent
full.

A new table is created
17 is the first prime

twice as large as the
old one; so
Hnew (X) = X mod 17

Rehashing
Rehashing is an expensive operation
– Running time is O(N)
Rehashing frees the programmer from
worrying about table size
Amortized Analysis: Average over N
operations
– Operations take: O(1) time

Quadratic Probing
Linear probing
– Tends to form long clusters of keys in the table
– This causes longer search chains
Ex:

Quadratic Probing
Quadratic probing can reduce the effect of primary
clustering
Index increments form a quadratic series
– F(i) = i2
Hash values: s, s+12, s+22, s+32, s+42, etc.
– (all % TableSize)
Direct calculation involves multiply, add, remainder
– Incremental calculation better
Initially:
size_t index = ... 1st probe slot ...
int k = -1;
At each iteration:
k += 2;
index = (index + k) % table.size();
Quadratic Probing
When 49
collides with
89, next
position
attemped is
one cell away
58 collides at
position 8. The
cell one away
is tried,
another
collision
occurs. It is
inserted into
the cell 22=4
away

Quadratic Probing
Probe sequence may not produce all table
slots
– A loop around full cells may happen
– Hash table not full but empty space not found
Theorem : If the table size is prime and more

than half of the table is empty new element
can always be inserted.
Problem : Secondary clustering!...

Double Hashing
Double Hashing solves the clustering problem
Use second hash function
F(i) = i * hash2(x)
Poor example :
hash2(x) = x mod 9
hash1(x) = x mod 10
TableSize = 10
If x = 99 what happens ?
hash2(x) ≠ 0 for any x

Double Hashing
Good choice :
hash2(x) = R – (x mod R)
R is a prime and < TableSize

Double Hashing
hash2(x) = 7 – (X mod 7)
Chaining
Alternative to open addressing
Each table slot references a linked list
– List contains all items that hash to that slot
– The linked list is often called a bucket
– So sometimes called bucket hashing
Examines only items with same hash code
– As opposed to open addressing (search chains may
overlap)
Insertion about as complex
Deletion is simpler
Linked list can become long rehash

Chaining Picture
Two items hashed to bucket 3
Three items hashed to bucket 4

Performance of Hash Tables
Load factor = # filled cells / table size
– Between 0 and 1
Load factor has greatest effect on
performance
Lower load factor better performance
– Reduce collisions in sparsely populated tables

For open addressing, linear probing
– Knuth gives expected # probes
1 1 
I & U ⇒ 1 + 

2  (1 − λ )2 
1 1 
S ⇒ 1 + 

2  (1 − λ ) 
‫ג‬
– As approaches 1
• Number of probes increases
• insertions might fail
– Rehashing with larger TableSize
• if ‫ > ג‬0.5
• if insertion fails
For chaining
– ‫ ג‬is the avarage length of a list
– Successful Find ‫ג‬/2 comparisons + time to
evaluate hash function
– Unsuccessful Find & Insert ‫ ג‬comparisons +
time to evaluate hash function
Good choice ‫ ~ ג‬1
Here ‫ ג‬can be greater than 1
Disadvantage of separate chaining is

allocate/deallocate memory !
L Number of Probes
Linear Probing Chaining
0 1.00 1.00
0.25 1.17 1.13
0.5 1.50 1.25
0.75 2.50 1.38
0.83 3.38 1.43
0.9 5.50 1.45
0.95 10.50 1.48

Random Collision Resolution
Performance of double hashing is better than linear probing
Assume: Random collision resolution
– Probes are independent
– No clustering problem
Unsuccessful search and Insert
– Number of probes until an empty cell is found
(1- ‫ = )ג‬fraction of cells that are empty
1 / (1- ‫ = )ג‬expected number of probes until an empty cell
Successful search
P(X)=Number of probes when the element X is inserted
1/N∑ P(X) approximately
1 λ 1 1 1
λ ∫0
dx = ln
1− x λ 1− λ

Performance of Hash Tables (3)
Hash table:
– Insert: average O(1)
– Search: average O(1)
Sorted array:
– Insert: average O(n)
– Search: average O(log n)
Binary Search Tree:
– Insert: average O(log n)
– Search: average O(log n)
But balanced trees can guarantee O(log n)
Implementing Hash Tables
Class hash_map: used for both
implementations
typedef std::pair<const Key_Type, Value_Type>
Entry_Type;
File Hash_Table_Open.h: implements open

addressing
Class Hash_Table_Chain: implements
chaining
Further implementation concerns

Class hash_map
Function Behavior
iterator begin() Returns an iterator to the first entry in the
const_iterator begin() const map.
iterator end() Returns an iterator to one past the last entry
const_iterator end() const in the map.
bool empty() Returns true if the map is empty.
size_t size() Returns the number of entries in the map.
pair<iterator bool> insert(const Inserts an entry into the map. If the key is
Entry_Type& entry> not in the map, the returned iterator
references the inserted entry, and the bool
will be true. If the key is currently in the
map, the returned iterator references the
existing entry, and the bool will be false.
void erase(const Key_Type& key) Removes the item from the map.
iterator find(const Key_Type& Returns an iterator that references the item in
key) the map. If not present, end() is returned.
Value_Type& operator[](const Returns a reference to the value mapped to
Key_Type& key) the key. If the key is not currently in the
map, a default value is inserted, and a
reference to the inserted item is returned.

Hash_Table_Open
Data Fields and Private Functions
Data Field Attribute
hash<Key_Type> hash_fcn The function object that will compute the
hash function for the Key_Type.
size_t num_keys The number of keys in the table,
excluding deleted keys.
size_t num_deletes The number of deleted keys.
std::vector<Entry_Type*> the_table The vector to hold the hash table.
const double LOAD_THRESHOLD The maximum load factor.
static const size_t The initial capacity
INITIAL_CAPACITY
const Entry_Type dummy A dummy entry to represent deleted keys
const Entry_Type* DELETED A pointer to the dummy entry.
Function Behavior
size_t locate(const Key_Type& key) Returns the index of the specified key if
present in the table; otherwise; returns
the index of the first free slot.
void rehash() Doubles the capacity of the table and
permanently removes deleted items.

The hash template class
The hash template class is a function class
that implements the hash function for its
template parameter.
It is defined as follows:
template<typename Key_Type>
struct hash {
size_t operator()(const Key_Type&);
};
We will describe some implementations later

The locate function
size_t locate(const Key_Type& key) {
size_t index = hash_fcn(key) % the_table.size ();
while (the_table[index] != NULL
&& (the_table[index] == DELETED
|| the_table[index]->first != key))
index = (index + 1) % the_table.size ();
return index;
}

The insert function
std::pair<iterator, bool> insert(const Entry_Type& entry) {
double load_factor = double(num_keys + num_deletes) /
the_table.size();
if (load_factor > LOAD_THRESHOLD){
rehash(); // Double the size of the table.
}
// Find the position in the table.
size_t index = locate(entry.first);
// See whether it is empty.
if (the_table[index] == NULL) {
// Create a new entry.
the_table[index] = new Entry_Type(entry);
num_keys++;
return std::make_pair(iterator(this, index), true);
} else {
// Item is already in the table.
return std::make_pair(iterator(this, index), false);
}
}

The index operator
Value_Type& operator[](const Key_Type& key) {
// Try to insert a dummy item.
std::pair<iterator, bool> ret = insert(Entry_Type(key,
Value_Type()));
// Return a reference to the value found or inserted.
return ret.first->second;
}

The rehash function
void rehash() {
// Create a new table whose size is double the current table.
std::vector<Entry_Type*> other_table(the_table.size() * 2, NULL);
// Swap this table with the current table.

the_table.swap(other_table);
// Reinsert all items from old table to new.

num_deletes = 0;
for (size_t i = 0; i < other_table.size(); i++) {
if ((other_table[i] != NULL) && (other_table[i] != DELETED)) {
size_t index = locate(other_table[i]->first);
the_table[index] = other_table[i];
}
}
}

The erase function
Algorithm for erase
1. Find the first table element that is empty or that
contains the key.
2. if an empty element is found
3. done
4. else
5. Delete the Entry_Type object pointed to.
6. Set the pointer in this entry to DELETED
7. Increment num_deletes
8. Decrement num_keys
Implementation is left as an exercise

Copy Constructor, etc.
Because the vector<Entry_Type*> contains pointers
to dynamically created objects, we must define the
copy constructor, assignment operator, and
destructor.
hash_map(const hash_map<Key_Type, Value_Type>& other) :
hash_fcn(hash<Key_Type>()), num_keys(0),
the_table(other.the_table.size(), NULL),
LOAD_THRESHOLD(0.75),
num_deletes(0) {
for (size_t i = 0; i < other.the_table.size(); i++) {
if (other.the_table[i] != NULL &&
other.the_table[i] != DELETED)
insert(Entry_Type(other.the_table[i]->first,
other.the_table[i]->second));
}
}

Hash_Table_Chain.h
Data Field Attribute
hash<Key_Type> hash_fcn The hash function object.
size_t num_keys The number of keys
std::vector<std::list<Entry_Type> > A vector of lists containing the items
the_buckets
static const size_t INITIAL_CAPACITY The initial capacity
static double LOAD_THRESHOLD The maximum load threshold before
rehashing.

The insert function
std::pair<iterator, bool>
insert(const Entry_Type& entry) {
// Check for the need to rehash.
double load_factor = double(num_keys) / the_buckets.size();
if (load_factor > LOAD_THRESHOLD){rehash();}
// Find the position in the table.
size_t index = hash_fcn(entry.first) % the_buckets.size();
// Search for the key.
typename std::list<Entry_Type>::iterator pos =
the_buckets[index].begin();
while (pos != the_buckets[index].end()
&& pos->first != entry.first) ++pos;
if (pos == the_buckets[index].end()) { // Not in table
the_buckets[index].push_back(Entry_Type(entry));
num_keys++;
return std::make_pair(iterator(this, index,
--(the_buckets[index].end())),
true);
} else { // Already there
return std::make_pair(iterator(this, index, pos), false);
}
}

Copy Constructor, etc.
Since Hash_Table_Chain.h uses a
std::vector<std::list<Entry_Type> > to hold the
entries:
– The vector and list both define the copy
constructor, etc. to make a deep copy and to
clean up any dynamically allocated objects when
destroyed.
Therefore, no special implementation is
required for Hash_Table_Chain.h

Implementation Considerations
The hash template class defines the hash function,
but provides no implementation.
The user of the hash_map (either
Hash_Table_Open.h or Hash_Table_Chain.h) must
define a specialization for the Key_Type.
Example for string:
template<>
struct hash<std::string> {
size_t operator()(const std::string& s) {
size_t result = 0;
for (size_t i = 0; i < s.length(); i++) {
result = result * 31 + s[i];
}
return result;
}
};

Specialization for int
One possibility for int is to merely return the
integer value, cast to size_t.
– This does not produce a “random” distribution of
keys.
An alternative is to multiply by some large
prime and take the result modulo the
computer’s word size.
template<>
struct hash<int> {
size_t operator()(int i) {
return size_t(4262999287U * i);
}
};

Defining Your Own hash function
The hash_map class implementations use the
hash<Key_Type> function to locate the initial search
position.
It then they use the objects equality operator
(operator ==) to determine if there is a match.
Therefore, your hash function should obey the
following constraint:
– If obj1 == obj2 then
hash<type>()(obj1) == hash<type>()(obj2)

Applying Maps: Phone Directory
string Phone_Directory::add_or_change_entry(
const string& name, const string& number) {
string old_number = the_directory[name];
the_directory[name] = number;
modified = true;
return old_number;
}
string Phone_Directory::lookup_entry(const string& name)

const
{
const_iterator itr = the_directory.find(name);
if (itr != the_directory.end())
return itr->second;
else
return "";
}

Applying Maps: Phone Directory (2)
string Phone_Directory::remove_entry(const
string& name)
{
string old_number = the_directory[name];
the_directory.erase(name);
modified = old_number != string();
return old_number;
}

Applying Maps: Huffman Coding
Algorithm for build_frequency_table
1. while there are more characters in the input file
2. Read a character
3. Increment the entry in the map associated with this character
4. for each entry in the map
5. Store its data as the weight-symbol pair in the
vector<Huff_Data>
6. Return the vector<Huff_Data>

The build_frequency_table function
vector<Huff_Data> Huffman_Tree::build_frequency_table(istream&
in) {
map<char, int> frequencies;
char c;
while (in.get(c)) {
frequencies[c]++;
}
vector<Huff_Data> result;
for (map<char, int>::iterator itr = frequencies.begin();
itr != frequencies.end(); ++itr) {
result.push_back(Huff_Data(itr->second, itr->first));
}
return result;
}

build_code
Additional data field:
std::map<char, Bit_String> code_map;
Starter function
void build_code() {
code_map.clear();
build_code(huff_tree, Bit_String()); }
Recursive function
void Huffman_Tree::build_code(const Binary_Tree<Huff_Data>& tree,
const Bit_String& code) {
if (tree.is_leaf()) {
Huff_Data datum = tree.get_data();
code_map[datum.symbol] = code;
} else {
Bit_String left_code(code);
left_code.append(false);
build_code(tree.get_left_subtree(), left_code);
Bit_String right_code(code);
right_code.append(true);
build_code(tree.get_right_subtree(), right_code);
}
}

encode
void Huffman_Tree::encode(std::istream& in, std::ostream& out)
{
Bit_String result;
char next_char;
while (in.get(next_char)) {
result += code_map[next_char];
}
result.write(out);
}

Data Structures AND Algorithms: Lecture Notes 11

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Data Structures AND Algorithms: Lecture Notes 11

Загружено:

Авторское право:

Доступные форматы

Bilgisayar Mühendisliği Bölümü

GIT – Computer Engineering Department

GIT – Computer Engineering Department 2

GIT – Computer Engineering Department 3

GIT – Computer Engineering Department 4

GIT – Computer Engineering Department 6

GIT – Computer Engineering Department 7

GIT – Computer Engineering Department 8

GIT – Computer Engineering Department 9

GIT – Computer Engineering Department 10

GIT – Computer Engineering Department 11

GIT – Computer Engineering Department 12

string data1[] = {"Apples", "Oranges", "Pineapples"};

GIT – Computer Engineering Department 13

bool is_member = (set1.find(string("Apples")) != set1.end());

GIT – Computer Engineering Department 14

How to implement a set???

GIT – Computer Engineering Department 15

How to implement a multiset??

GIT – Computer Engineering Department 17

GIT – Computer Engineering Department 18

GIT – Computer Engineering Department 19

GIT – Computer Engineering Department 20

GIT – Computer Engineering Department 21

GIT – Computer Engineering Department 24

void build_index(istream& in, map_type& index) {

GIT – Computer Engineering Department 25

GIT – Computer Engineering Department 26

GIT – Computer Engineering Department 27

Hash tables provide this capability

Searching an array: O(n) Searching BST: O(log n)

GIT – Computer Engineering Department 28

Desire: The set of keys should spread evenly over

GIT – Computer Engineering Department 29

What if the characters are Unicode

Collisions are certainly possible (see later)

GIT – Computer Engineering Department 31

GIT – Computer Engineering Department 32

GIT – Computer Engineering Department 33

GIT – Computer Engineering Department 34

This is the hash function used by Java in its String

GIT – Computer Engineering Department 35

– Spread values evenly: as if “random”

Generally, number of possible values much

GIT – Computer Engineering Department 37

GIT – Computer Engineering Department 38

If collision try an alternate cell

hi(x) = (hash(x) + F(i)) % table.size()

GIT – Computer Engineering Department 39

GIT – Computer Engineering Department 41

Note: many lookups will probe a lot!

GIT – Computer Engineering Department 42

GIT – Computer Engineering Department 43

GIT – Computer Engineering Department 44

(Re)insert non-deleted items into new array

Rehashing distributes items at least as well

GIT – Computer Engineering Department 45

17 is the first prime

Hnew (X) = X mod 17

GIT – Computer Engineering Department 46

GIT – Computer Engineering Department 47

GIT – Computer Engineering Department 48