Академический Документы
Профессиональный Документы
Культура Документы
DATA STRUCTURES
AND
ALGORITHMS
Lecture Notes 11
Sets and Maps
Spring 2008
set1.insert(data1, data1+3);
set2.insert(data2, data2+3);
cout << "set1 is " << set1 << endl;
cout << "set2 is " << set2 << endl;
set_union(set1.begin(), set1.end(),
set2.begin(), set2.end(),
inserter(set_u, set_u.begin()));
cout << "set1 + set2 is " << set_u << endl;
set_intersection(set1.begin(), set1.end(),
set2.begin(), set2.end(),
inserter(set_i, set_i. begin()));
cout << "set1 * set2 is " << set_i << endl;
return 0;
}
Note:
– If a mapping exists, assignment will replace it.
– If a mapping does not exist, a reference will
create one with a default value
GIT – Computer Engineering Department 23
EX: Using a map to build an index
Index the words in a text with their line
numbers
– Use a map (map<string, list<int>>)
– Each word (string) is a key
– List of line numbers (list<int>) is a value
Considerations :
Devising a hash function
Decide on table size
Decide what to do when
collision
size_t hash = 0;
for (size_t i = 0; i < s.size(); ++i)
hash = hash * 31 + s[i];
hash = has % table.size()
– Cheap to compute
A new table is created
58 collides at
position 8. The
cell one away
is tried,
another
collision
occurs. It is
inserted into
the cell 22=4
away
Poor example :
hash2(x) = x mod 9
hash1(x) = x mod 10
TableSize = 10
If x = 99 what happens ?
hash2(x) ≠ 0 for any x
hash2(x) = 7 – (X mod 7)
GIT – Computer Engineering Department 54
Chaining
Alternative to open addressing
Each table slot references a linked list
– List contains all items that hash to that slot
– The linked list is often called a bucket
– So sometimes called bucket hashing
Examines only items with same hash code
– As opposed to open addressing (search chains may
overlap)
Insertion about as complex
Deletion is simpler
Linked list can become long rehash
1 1
S ⇒ 1 +
2 (1 − λ )
ג
– As approaches 1
• Number of probes increases
• insertions might fail
– Rehashing with larger TableSize
• if > ג0.5
• if insertion fails
GIT – Computer Engineering Department 58
Performance of Hash Tables
For chaining
– גis the avarage length of a list
– Successful Find ג/2 comparisons + time to
evaluate hash function
– Unsuccessful Find & Insert גcomparisons +
time to evaluate hash function
Good choice ~ ג1
Here גcan be greater than 1
L Number of Probes
Linear Probing Chaining
0 1.00 1.00
0.25 1.17 1.13
0.5 1.50 1.25
0.75 2.50 1.38
0.83 3.38 1.43
0.9 5.50 1.45
0.95 10.50 1.48
1 λ 1 1 1
λ ∫0
dx = ln
1− x λ 1− λ
Starter function
void build_code() {
code_map.clear();
build_code(huff_tree, Bit_String()); }
Recursive function
void Huffman_Tree::build_code(const Binary_Tree<Huff_Data>& tree,
const Bit_String& code) {
if (tree.is_leaf()) {
Huff_Data datum = tree.get_data();
code_map[datum.symbol] = code;
} else {
Bit_String left_code(code);
left_code.append(false);
build_code(tree.get_left_subtree(), left_code);
Bit_String right_code(code);
right_code.append(true);
build_code(tree.get_right_subtree(), right_code);
}
}