Hashing

Week 14: Hashing
STIA2024
Data Structures & Algorithm
Analysis
1
Chapter Contents
 What is Hashing?
 Hash Functions
 Computing Hash Codes
 Compression a Hash Code into an Index for
the Hash Table
 Resolving Collisions
 Open Addressing with Linear Probing
 Open Addressing with Quadratic Probing
 Separate Chaining
2
Learning Objective
 To describe the basic idea of hashing,
 To describe the purpose of a hash table, and a hash
function,
 To describe how a hash function compresses a hash
code into an index to hash table,
 To explain what collisions are and why they occur,
 To describe open addressing as a method to resolve
collisions,
 To describe linear probing, and quadratic probingas
particular open addressing schemes,
 To describe separate chaining as method to resolve
collisions, and
 To describe the relative efficiencies of various
3
collisions resolution techniques.
Chapter Contents (ctd.)
 Efficiency
 The Load Factor
 The Cost of Open Addressing
 The Cost of Separate Chaining
4
What is Hashing?
 A technique that determines an index or location
for storage of an item in a data structure
 The hash function receives the search key
 Returns the index of an element in an array
called the hash table
 The index is known as the hash index
 Hashing can be excellent choice when searching is
the primary task.
 A technique that ideally can result in O(1) search
time.
 A perfect hash function maps each search key into
a different integer suitable as an index to the hash
table
5
What is Hashing?
Fig. 1: A hash function indexes its hash table.

6
What is Hashing?
 Two steps of the hash function
 Convert the search key into an integer
called the hash code
 Compress the hash code into the range
of indices for the hash table
 Typical hash functions are not perfect
 They can allow more than one search
key to map into a single index
 This is known as a collision
7
What is Hashing?
Fig. 2: A collision caused by the hash function h

8
Hash Functions
 General characteristics of a good

hash function
 Minimize collisions
 Distribute entries uniformly
throughout the hash table
 Be fast to compute
9
Computing Hash Codes
 We will override the hashCode method of
Object
 Guidelines
 If a class overrides the method equals, it should
override hashCode
 If the method equals considers two objects equal,
hashCode must return the same value for both
objects
 If an object invokes hashCode more than once
during execution of program on the same data, it
must return the same hash code
 If an object's hash code during one execution of a
program can differ from its hash code during
10
another execution of the same program
Computing Hash Codes
 The hash code for a string, s
int hash = 0;
int n = s.length();
for (int i = 0; i < n; i++)
hash = g * hash + s.charAt(i); // g is a positive constant
 Hash code for a primitive type

 Use the primitive typed key itself (e.g.
int)
 Manipulate internal binary
representations
 Use folding (a bit-wise boolean operation
11
such as exclusive or)
Compressing a Hash Code
 Must compress the hash code so it fits into
the index range
 Typical method for a code c is to compute
c modulo n (c%n)
 n is a prime number (the size of the
table)
 Index will then be between 0 and n – 1
private int getHashIndex(Object key)
{ int hashIndex = key.hashCode() % hashTable.length;
if (hashIndex < 0)
hashIndex = hashIndex + hashTable.length;
return hashIndex;
} // end getHashIndex
12
Resolving Collisions
 Options when hash functions returns

location already used in the table
 Use another location in the table
(open addressing)
 Change the structure of the hash table
so that each array location can
represent multiple values (separate
chaining)
13
Open Addressing with Linear
Probing
 Open addressing scheme locates alternate
location in hash table that is available, or
open.
 Locating an open location in a hash table is
called probing.
 Linear probing
 Resolves a collision by examining
consecutive locations in hash table,
beginning at the original hash index and
locating the next available location.
 If collision occurs at hashTable[k], look
successively at location k + 1, k + 2, …
14
Probing
Fig. 3 : The effect of linear probing after adding four

entries whose search keys hash to the same index. 15
Probing
Fig. 4: A revision of the hash table shown in 19-3 when

linear probing resolves collisions; each entry contains a
search key and its associated value 16
Removals
Fig. 5: A hash table if remove used null

17
to remove entries.
Removals
 We need to distinguish among three
kinds of locations in the hash table
1. Occupied
 The location references an entry in the
dictionary
2. Empty
 The location contains null and always did
3. Available
 The location's entry was removed from the
dictionary
18
Probing
Fig. 6: A linear probe sequence (a) after adding an entry;

(b) after removing two entries;
19
Probing
Fig. 6: A linear probe sequence (c) after a search; (d)

during the search while adding an entry; (e) after an
addition to a formerly occupied location. 20
Searches that Dictionary
Operations Require
 To retrieve an entry
 Search the probe sequence for the key
 Examine entries that are present, ignore locations
in available state
 Stop search when key is found or null reached
 To remove an entry
 Search the probe sequence same as for retrieval
 If key is found, mark location as available
 To add an entry
 Search probe sequence same as for retrieval
 Note first available slot
 Use available slot if the key is not found
21
Open Addressing, Quadratic
Probing
 Change the probe sequence
 Given search key k
 Probe to k + 1, k + 22, k + 32, … k + n2
 Reaches every location in the hash

table if table size is a prime number
 For avoiding primary clustering
 But can lead to secondary clustering
22
Open Addressing, Quadratic
Probing
Fig. 7: A probe sequence of length 5

using quadratic probing.
23
Separate Chaining
 Alter the structure of the hash table
 Each location can represent multiple
values
 Each location called a bucket
 Bucket can be a/an
 List
 Sorted list
 Chain of linked nodes
 Array
 Vector
 Resolving collisions by using buckets
that are linked chains. 24
Separate Chaining
Fig. 9: A hash table for use with separate chaining; each

bucket is a chain of linked nodes.
25
Separate Chaining
Fig. 10: Where new entry is inserted into linked bucket

when integer search keys are (a) duplicate and unsorted;
26
Separate Chaining

when integer search keys are (b) distinct and unsorted;
27
Separate Chaining

when integer search keys are (c) distinct and sorted
28
Separate Chaining
 Separate Chaining provides an efficiency

and simple way to resolve collisions.
 However, separate chaining requires more
memory than open addressing.
29
Efficiency Observations
 Successful retrieval or removal

 Same efficiency as successful search
 Unsuccessful retrieval or removal
 Same efficiency as unsuccessful search
 Successful addition
 Same efficiency as unsuccessful search
 Unsuccessful addition
 Same efficiency as successful search
30
Load Factor
 Perfect hash function not always possible

or practical
 Thus, collisions likely to occur
 As hash table fills

 Collisions occur more often
 Measure for table fullness, the load factor

(the ratio of the size of the data to the size
of the hash table)
31
Load Factor
  is zero – when hash table is empty

 For open addressing – the maximum value
of  is 1 when the hash table is full.
  not measure the number of locations in
available state (especially for separate
chaining:  has no maximum value)
32
Cost of Open Addressing
Note: Reasonable
efficiency requires
only < 0.5
Fig. 11: The average number of comparisons required by

a search of the hash table for given values of the load
factor when using linear probing.
33
Cost of Open Addressing
Note: for quadratic

probing or double
hashing, should
have < 0.5
Fig. 12: The average number of comparisons

required by a search of the hash table for given
values of the load factor when using either
quadratic probing or double hashing.
34
Cost of Separate Chaining
Note: Reasonable
efficiency requires
only < 1
Fig. 13: Average number of comparisons required by

search of hash table for given values of load factor
when using separate chaining. 35
References
 Data Structures and Abstractions with Java . Authors: Frank
M. Carrano & Walter Savitch . Chapter 19.
 Data Structures with Java . Authors : Hubbard J.R. & Huray

A. . Chapter 9
36
Conclusion
Q & A Session
37

Hashing

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Hashing

Загружено:

Авторское право:

Доступные форматы

Week 14: Hashing

Fig. 1: A hash function indexes its hash table.

Fig. 2: A collision caused by the hash function h

 General characteristics of a good

 Hash code for a primitive type

 Options when hash functions returns

Fig. 3 : The effect of linear probing after adding four

Fig. 4: A revision of the hash table shown in 19-3 when

Fig. 5: A hash table if remove used null

Fig. 6: A linear probe sequence (a) after adding an entry;

Fig. 6: A linear probe sequence (c) after a search; (d)

 Reaches every location in the hash

Fig. 7: A probe sequence of length 5

Fig. 9: A hash table for use with separate chaining; each

Fig. 10: Where new entry is inserted into linked bucket

Fig. 10: Where new entry is inserted into linked bucket

Fig. 10: Where new entry is inserted into linked bucket

 Separate Chaining provides an efficiency

 Successful retrieval or removal

 Perfect hash function not always possible

 As hash table fills

 Measure for table fullness, the load factor

  is zero – when hash table is empty

Fig. 11: The average number of comparisons required by

Note: for quadratic

Fig. 12: The average number of comparisons

Fig. 13: Average number of comparisons required by

 Data Structures with Java . Authors : Hubbard J.R. & Huray

Вам также может понравиться