You are on page 1of 37

# Week 14: Hashing

STIA2024
Data Structures & Algorithm
Analysis

1
Chapter Contents
 What is Hashing?
 Hash Functions
 Computing Hash Codes
 Compression a Hash Code into an Index for
the Hash Table
 Resolving Collisions
 Open Addressing with Linear Probing

 Separate Chaining

2
Learning Objective
 To describe the basic idea of hashing,
 To describe the purpose of a hash table, and a hash
function,
 To describe how a hash function compresses a hash
code into an index to hash table,
 To explain what collisions are and why they occur,
 To describe open addressing as a method to resolve
collisions,
 To describe linear probing, and quadratic probingas
 To describe separate chaining as method to resolve
collisions, and
 To describe the relative efficiencies of various
3
collisions resolution techniques.
Chapter Contents (ctd.)

 Efficiency
 The Cost of Open Addressing
 The Cost of Separate Chaining

4
What is Hashing?
 A technique that determines an index or location
for storage of an item in a data structure
 The hash function receives the search key
 Returns the index of an element in an array
called the hash table
 The index is known as the hash index
 Hashing can be excellent choice when searching is
 A technique that ideally can result in O(1) search
time.
 A perfect hash function maps each search key into
a different integer suitable as an index to the hash
table

5
What is Hashing?

## Fig. 1: A hash function indexes its hash table.

6
What is Hashing?
 Two steps of the hash function
 Convert the search key into an integer
called the hash code
 Compress the hash code into the range
of indices for the hash table
 Typical hash functions are not perfect
 They can allow more than one search
key to map into a single index
 This is known as a collision

7
What is Hashing?

8
Hash Functions

##  General characteristics of a good

hash function
 Minimize collisions
 Distribute entries uniformly
throughout the hash table
 Be fast to compute

9
Computing Hash Codes
 We will override the hashCode method of
Object
 Guidelines
 If a class overrides the method equals, it should
override hashCode
 If the method equals considers two objects equal,
hashCode must return the same value for both
objects
 If an object invokes hashCode more than once
during execution of program on the same data, it
must return the same hash code
 If an object's hash code during one execution of a
program can differ from its hash code during
10
another execution of the same program
Computing Hash Codes
 The hash code for a string, s
int hash = 0;
int n = s.length();
for (int i = 0; i < n; i++)
hash = g * hash + s.charAt(i); // g is a positive constant

##  Hash code for a primitive type

 Use the primitive typed key itself (e.g.
int)
 Manipulate internal binary
representations
 Use folding (a bit-wise boolean operation
11
such as exclusive or)
Compressing a Hash Code
 Must compress the hash code so it fits into
the index range
 Typical method for a code c is to compute
c modulo n (c%n)
 n is a prime number (the size of the
table)
 Index will then be between 0 and n – 1
private int getHashIndex(Object key)
{ int hashIndex = key.hashCode() % hashTable.length;
if (hashIndex < 0)
hashIndex = hashIndex + hashTable.length;
return hashIndex;
} // end getHashIndex
12
Resolving Collisions

##  Options when hash functions returns

location already used in the table
 Use another location in the table
 Change the structure of the hash table
so that each array location can
represent multiple values (separate
chaining)

13
Probing
 Open addressing scheme locates alternate
location in hash table that is available, or
open.
 Locating an open location in a hash table is
called probing.
 Linear probing
 Resolves a collision by examining
consecutive locations in hash table,
beginning at the original hash index and
locating the next available location.
 If collision occurs at hashTable[k], look
successively at location k + 1, k + 2, …
14
Probing

## Fig. 3 : The effect of linear probing after adding four

entries whose search keys hash to the same index. 15
Probing

## Fig. 4: A revision of the hash table shown in 19-3 when

linear probing resolves collisions; each entry contains a
search key and its associated value 16
Removals

## Fig. 5: A hash table if remove used null

17
to remove entries.
Removals
 We need to distinguish among three
kinds of locations in the hash table
1. Occupied
 The location references an entry in the
dictionary
2. Empty
 The location contains null and always did
3. Available
 The location's entry was removed from the
dictionary

18
Probing

## Fig. 6: A linear probe sequence (a) after adding an entry;

(b) after removing two entries;

19
Probing

## Fig. 6: A linear probe sequence (c) after a search; (d)

during the search while adding an entry; (e) after an
addition to a formerly occupied location. 20
Searches that Dictionary
Operations Require
 To retrieve an entry
 Search the probe sequence for the key
 Examine entries that are present, ignore locations
in available state
 Stop search when key is found or null reached
 To remove an entry
 Search the probe sequence same as for retrieval
 If key is found, mark location as available
 Search probe sequence same as for retrieval
 Note first available slot
21
Probing
 Change the probe sequence
 Given search key k
 Probe to k + 1, k + 22, k + 32, … k + n2

##  Reaches every location in the hash

table if table size is a prime number
 For avoiding primary clustering
 But can lead to secondary clustering

22
Probing

## Fig. 7: A probe sequence of length 5

23
Separate Chaining
 Alter the structure of the hash table
 Each location can represent multiple
values
 Each location called a bucket
 Bucket can be a/an
 List
 Sorted list
 Array
 Vector
 Resolving collisions by using buckets
Separate Chaining

## Fig. 9: A hash table for use with separate chaining; each

bucket is a chain of linked nodes.
25
Separate Chaining

## Fig. 10: Where new entry is inserted into linked bucket

when integer search keys are (a) duplicate and unsorted;
26
Separate Chaining

## Fig. 10: Where new entry is inserted into linked bucket

when integer search keys are (b) distinct and unsorted;
27
Separate Chaining

## Fig. 10: Where new entry is inserted into linked bucket

when integer search keys are (c) distinct and sorted
28
Separate Chaining

##  Separate Chaining provides an efficiency

and simple way to resolve collisions.
 However, separate chaining requires more

29
Efficiency Observations

##  Successful retrieval or removal

 Same efficiency as successful search
 Unsuccessful retrieval or removal
 Same efficiency as unsuccessful search
 Same efficiency as unsuccessful search
 Same efficiency as successful search

30

##  Perfect hash function not always possible

or practical
 Thus, collisions likely to occur

##  As hash table fills

 Collisions occur more often

##  Measure for table fullness, the load factor

(the ratio of the size of the data to the size
of the hash table)

31

##   is zero – when hash table is empty

 For open addressing – the maximum value
of  is 1 when the hash table is full.
  not measure the number of locations in
available state (especially for separate
chaining:  has no maximum value)

32

Note: Reasonable
efficiency requires
only < 0.5

## Fig. 11: The average number of comparisons required by

a search of the hash table for given values of the load
factor when using linear probing.
33

probing or double
hashing, should
have < 0.5

## Fig. 12: The average number of comparisons

required by a search of the hash table for given
values of the load factor when using either
34
Cost of Separate Chaining

Note: Reasonable
efficiency requires
only < 1

## Fig. 13: Average number of comparisons required by

search of hash table for given values of load factor
when using separate chaining. 35
References
 Data Structures and Abstractions with Java . Authors: Frank
M. Carrano & Walter Savitch . Chapter 19.

A. . Chapter 9

36
Conclusion

Q & A Session

37