Академический Документы
Профессиональный Документы
Культура Документы
Hash-table principles. Closed-bucket and open-bucket hash tables. Searching, insertion, deletion. Hash-table design.
12-1
12-3
12-4
12-5
All words with initial letter A share bucket 0; all words with initial letter Z share bucket 25.
This is a convenient choice for illustrative purposes. This is a poor choice for practical purposes: collisions are likely to be frequent in some buckets.
12-6
Note that hashCode is consistent. We can use it to implement a hash function for a hash table with m buckets:
int hash (Object k) { return Math.abs(k.hashCode()) % m; } Math.abs returns a nonnegative integer. Modulo-m arithmetic then gives an integer in the range 0m1.
12-7
Examples:
Class of k String Result of k.hashCode()
Integer integer value of k Date (high 32 bits of k) exclusive-or (low 32 bits of k), where k is expressed in milliseconds since 1970-01-01
12-8
12-9
12-10
is represented by
53
Kr 36
Ne 10 Xe 54
12-11
Be 4
Ca 20 H 1
is represented by
Mg 12 Na 11 Rb 37 Sr 38
12-12
12-15
12-16
12-17
12-19
12-20
CBHTs: analysis
Analysis of the CBHT search/insertion/deletion algorithms (counting comparisons): Let the number of entries be n. In the best case, no bucket contains more than (say) 2 entries: Max. no. of comparisons = 2 Best-case time complexity is O(1). In the worst case, one bucket contains all n entries: Max. no. of comparisons = n Worst-case time complexity is O(n).
12-21
CBHTs: design
CBHT design consists of:
choosing the number of buckets m choosing the hash function hash.
Design aims:
collisions are infrequent entries are distributed evenly among the buckets, such that few buckets contain more than about 2 entries.
12-22
12-24
12-25
Example 2 (2)
hash(w) can depend on any of ws letters and/or length.
12-26
Example 2 (3)
Consider m = 520, hash(w) = 26 (length of w 1) + (initial letter of w A).
Too few buckets. Load factor = 1000/520 1.9. Very uneven distribution. Since few words have length 02, buckets 051 will be sparsely populated. Since initial letter Z is uncommon, buckets 25, 51, 77, 103, will be sparsely populated. And so on.
12-28
12-29
occupied neveroccupied
is represented by
Kr 36
Ne 10
22 23 Xe 54 24 25
12-30
cluster
is represented by
K 19 Li 3 Mg 12 Na 11
cluster
17 Rb 37 18 Sr 38 25
12-31
OBHT methods
Be 4 Ca 20 Cs 55 Ba 56
H He 1 2
K 19 Li 3 Mg 12 Na 11
17 Rb 37 18 Sr 38 25
12-34
K 19 Li 3 Mg 12 Na 11
17 Rb 37 18 Sr 38 19
25
12-36
12-39
Be 4 Ca 20 Cs 55 Ba 56 H He 1 2
K 19 Li 3 Mg 12 Na 11
0 1 2 3 4 5 6 7 8 9 10 11 12 13
Be Ca Cs Ba Fr H He
4 20 55 56 87 1 2
K 19 Li 3 Mg 12 Na 11
0 1 2 3 4 5 6 7 8 9 10 11 12 13
Be Ca Cs Ba Fr B H He
4 20 55 56 87 5 1 2
K 19 Li 3 Mg 12 Na 11
17 Rb 37 18 Sr 38
17 Rb 37 18 Sr 38
17 Rb 37 18 Sr 38
25
25
25
12-40
public void insert (Object key, Object val) { BucketEntry newest = new BucketEntry(key, val); int b = hash(key); for (;;) { BucketEntry old = buckets[b]; if (old == null) { if (++load == buckets.length) ; buckets[b] = newest; return;
12-41
12-42
12-43
Be 4 Ca 20 Cs 55 Ba 56 H He 1 2
Deleting Ca:
K 19 Li 3 Mg 12 Na 11
0 1 2 3 4 5 6 7 8 9 10 11 12 13
Be
Cs 55 Ba 56
Deleting Ba:
formerlyoccupied
H He
1 2
K 19 Li 3 Mg 12 Na 11
0 1 2 3 4 5 6 7 8 9 10 11 12 13
Be
Cs 55
H He
1 2
K 19 Li 3 Mg 12 Na 11
17 Rb 37 18 Sr 38
17 Rb 37 18 Sr 38
17 Rb 37 18 Sr 38
25
25
25
12-44
12-45
OBHTs: analysis
Analysis of OBHT search/insertion/deletion algorithm (counting comparisons): Let the number of entries be n. In the best case, no cluster contains more than (say) 4 entries: Max. no. of comparisons = 4 Best-case time complexity is O(1). In the worst case, one cluster contains all n entries: Max. no. of comparisons = n Worst-case time complexity is O(n).
12-46
OBHTs: design
OBHT design consists of:
choosing the number of buckets m choosing the hash function hash choosing the step length s (explained later).
Design aims:
collisions are infrequent entries are distributed evenly over the hash table, such that few clusters contain more than about 4 entries.
12-47
12-48
The hash function should distribute the entries evenly over the buckets, with few long clusters.
In an OHBT with s = 1, a cluster will form when several entries fall into the same or adjacent buckets.
12-49
Be 4 Ca 20
K 19 Li 3 Mg 12 He 2
0 1 2 3 4 5 6 7 8 9 10 11 12 13
Be 4 Ca 20 Ba 56
hash(Cs) = 2 step(Cs) = 20
K 19 Li 3 Mg 12 He 2
0 1 2 3 4 5 6 7 8 9 10 11 12 13
Be 4 Ca 20 Ba 56
K 19 Li 3 Mg 12 He 2
17 Rb 37 18 Sr 38
17 Rb 37 18 Sr 38
17 Rb 37 18 Sr 38
22
22
22 Cs 55
12-52
The hash function distributes keys uniformly among the buckets (i.e., the probability that a key is mapped to any particular bucket is 1/m). In each trial, all three hash tables are loaded with the same set of n randomly-generated keys.
12-54
shorter cluster
OBHT (double hashing)
12-55
shorter cluster
shorter cluster
12-56
12-57
Example 4 (2)
Consider m = 100, hash(id) = first two digits of id.
Far too few buckets. Load factor 6000/100 60. Very uneven distribution. E.g., in academic year 200102, most ids start with 98, 99, 00, or 01.
Example 4 (3)
Consider OBHT with s = 1.
Four clusters of about 1500 entries.
12-59
add
remove
CBHT insertion
CBHT deletion
O(1) O(n)
O(1) O(n)
best worst
best worst
12-60