Академический Документы
Профессиональный Документы
Культура Документы
4 H ASH T ABLES
‣ hash functions
‣ separate chaining
‣ linear probing
‣ context
ST implementations: summary
sequential search
N N N N/2 N N/2 no equals()
(unordered list)
binary search
lg N N N lg N N/2 N/2 yes compareTo()
(ordered array)
Q. Can we do better?
A. Yes, but with different access to the data.
2
Hashing: basic plan
0
Hash function. Method for computing array index from key.
1
hash("it") = 3 2
3 "it"
??
4
Issues. hash("times") = 3 5
Ex 1. Phone numbers.
・Bad: first three digits.
・Better: last three digits.
table
index
5
Java’s hash code conventions
All Java classes inherit a method hashCode(), which returns a 32-bit int.
x y
x.hashCode() y.hashCode()
7
Implementing hash code: strings
String s = "call";
Ex. int code = s.hashCode(); 3045982 = 99·313 + 97·312 + 108·311 + 108·310
= 108 + 31· (108 + 31 · (97 + 31 · (99)))
(Horner's method)
8
Implementing hash code: strings
Performance optimization.
・Cache the hash value in an instance variable.
・Return cached value.
public final class String
{
private int hash = 0; cache of hash code
private final char[] s;
...
9
Implementing hash code: user-defined types
...
10
Hash code design
Basic rule. Need to use the whole key to compute hash code;
consult an expert for state-of-the-art hash codes.
11
Modular hashing
bug
1-in-a-billion bug
correct
12
Uniform hashing assumption
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Birthday problem. Expect two balls in the same bin after ~ π M / 2 tosses.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Java's String data uniformly distribute the keys of Tale of Two Cities
14
3.4 H ASH T ABLES
‣ hash functions
‣ separate chaining
‣ linear probing
‣ context
3.4 H ASH T ABLES
‣ hash functions
‣ separate chaining
‣ linear probing
‣ context
Collisions
1
hash("it") = 3
2
3 "it"
?? 4
hash("times") = 3 5
17
Separate chaining symbol table
M 4 9
P 3 10 M 9 H 5 C 4 R 3
L 3 11
E 0 12
19
Separate chaining ST: Java implementation
20
Analysis of separate chaining
(10, .12511...)
.125
0
0 10 20 30
Binomial distribution (N = 10 4 , M = 10 3 , ! = 10 )
sequential search
N N N N/2 N N/2 no equals()
(unordered list)
binary search
lg N N N lg N N/2 N/2 yes compareTo()
(ordered array)
equals()
separate chaining lg N * lg N * lg N * 3-5 * 3-5 * 3-5 * no
hashCode()
22
3.4 H ASH T ABLES
‣ hash functions
‣ separate chaining
‣ linear probing
‣ context
3.4 H ASH T ABLES
‣ hash functions
‣ separate chaining
‣ linear probing
‣ context
Collision resolution: open addressing
st[0] jocularly
st[1] null
st[2] listen
st[3] suburban
null
st[30000] browsing
25
Linear probing hash table demo
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
st[]
M = 16
Linear probing hash table demo
search K
hash(K) = 5
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
st[] P M A C S H L E R X
M = 16 K
search miss
(return null)
Linear probing hash table summary
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
st[] P M A C S H L E R X
M = 16
28
Linear probing ST implementation
30
Knuth's parking problem
displacement = 3
31
Analysis of linear probing
⇥ ⇥
1 1 1 1
⇥ 1+ ⇥ 1+
2 1 2 (1 )2
search hit search miss / insert
Pf.
Parameters.
・M too large ⇒ too many empty array entries.
・M too small ⇒ search time blows up.
・Typical choice: α = N / M ~ ½.
# probes for search hit is about 3/2
# probes for search miss is about 5/2
32
ST implementations: summary
sequential search
N N N N/2 N N/2 no equals()
(unordered list)
binary search
lg N N N lg N N/2 N/2 yes compareTo()
(ordered array)
equals()
separate chaining lg N * lg N * lg N * 3-5 * 3-5 * 3-5 * no
hashCode()
equals()
linear probing lg N * lg N * lg N * 3-5 * 3-5 * 3-5 * no
hashCode()
33
3.4 H ASH T ABLES
‣ hash functions
‣ separate chaining
‣ linear probing
‣ context
3.4 H ASH T ABLES
‣ hash functions
‣ separate chaining
‣ linear probing
‣ context
War story: String hashing in Java
http://www.cs.princeton.edu/introcs/13loop/Hello.java
http://www.cs.princeton.edu/introcs/13loop/Hello.class
http://www.cs.princeton.edu/introcs/13loop/Hello.html
http://www.cs.princeton.edu/introcs/12type/index.html
36
War story: algorithmic complexity attacks
38
Diversion: one-way hash functions
One-way hash function. "Hard" to find a key that will hash to a desired
value (or two keys that hash to same value).
known to be insecure
39
Separate chaining vs. linear probing
Separate chaining.
・Easier to implement delete.
・Performance degrades gracefully.
・Clustering less sensitive to poorly-designed hash function.
Linear probing.
・Less wasted space.
・Better cache performance.
Q. How to delete?
Q. How to resize?
40
Hashing: variations on the theme
Hash tables.
・Simpler to code.
・No effective alternative for unordered keys.
・Faster for simple keys (a few arithmetic ops versus log N compares).
・Better system support in Java for strings (e.g., cached hash code).
Balanced search trees.
・Stronger performance guarantee.
・Support for ordered ST operations.
・Easier to implement compareTo() correctly than equals() and hashCode().
Java system includes both.
・Red-black BSTs: java.util.TreeMap, java.util.TreeSet.
・Hash tables: java.util.HashMap, java.util.IdentityHashMap.
42
3.4 H ASH T ABLES
‣ hash functions
‣ separate chaining
‣ linear probing
‣ context
3.4 H ASH T ABLES
‣ hash functions
‣ separate chaining
‣ linear probing
‣ context