Separate Chaining Hash Table

Data Structures
Hashing — Separate Chaining
C. Aravindan
<AravindanC@ssn.edu.in>
Professor of Computer Science

SSN College of Engineering, Chennai
April 24, 2019
C. Aravindan (SSN) Data Structures April 24, 2019 1 / 28

Outline
1 Introduction

Outline
1 Introduction
2 Hash Functions

Outline
1 Introduction
2 Hash Functions
3 Separate Chaining

Outline
1 Introduction
2 Hash Functions
3 Separate Chaining
4 Summary

Introduction
There are several applications that require a “look-up” table — (key,

value) pairs to be stored
Insertion, search, and deletion based on keys

Introduction

FindMin, FindMax, sorting, traversal may not be needed

Introduction

Examples: dictionary lookup, symbol table, database indexing

Introduction

We may use Binary Search Tree with balancing

Introduction

Complexity: O(log n)

Introduction

Complexity: O(log n)
Is it possible to do this in constant time (on the average)!?

Basic Idea

Issues
Selection of hashing function is very crucial to achieve constant time

on the average
Hashing function should ideally handle any type of key (non-mutable)
It should be easy to compute!
Hashing function should ideally generate distinct indexes for distinct
keys
Generally, the “key density” is very high compared to the number of
pairs we wish to store
How do we choose appropriate TableSize?
What to do when the hashing function maps more than one key to an
index (collision handling)?

Simple Hash Function
If the key is an integer, then a simple hash function (key mod

TableSize) may work
A simple analysis reveals that the keys are well distributed across the
index space when TableSize is a prime number
Index
Hash ( KeyType Key , i n t T a b l e S i z e )
{
r e t u r n Key % T a b l e S i z e ;
}

How do we hash strings?
For categorical data, such as strings, we may first convert the key to
an integer

an integer
A simple idea for strings will be to just add up the ASCII values of the
characters

an integer
A simple idea for strings will be to just add up the ASCII values of the
characters
Index
Hash1 ( const char ∗ Key , i n t T a b l e S i z e )
{
unsigned i n t HashVal = 0 ;
w h i l e ( ∗ Key != ’ \0 ’ )
HashVal += ∗Key++;
r e t u r n HashVal % T a b l e S i z e ;
}

Hashing strings
This idea may be bad when TableSize is large and length of string is
short

Hashing strings
short
Example: TableSize = 10007 and strings are 8 characters at most

Hashing strings
short
Example: TableSize = 10007 and strings are 8 characters at most
Only indexes up to (127 * 8 = 1016) are used!

Another hash function for strings
We may do the following to generate a larger integer
key [0] + 27 ∗ key [1] + 272 ∗ key [2]
Why only 3 characters? 263 = 17576 combinations!

Another hash function for strings
We may do the following to generate a larger integer
key [0] + 27 ∗ key [1] + 272 ∗ key [2]
Why only 3 characters? 263 = 17576 combinations!
Index
Hash2 ( const char ∗ Key , i n t T a b l e S i z e )
{
r e t u r n ( Key [ 0 ] + 27 ∗ Key [ 1 ] + 729 ∗ Key [ 2 ] )
% TableSize ;
}

Simplifying the calculations
But, in reality there may be less than 3000 combinations!!!

And the computation gets complex when the number of characters
are increased


are increased
The above computation may be simplified as follows (referred to as
Horner’s rule)
k0 + 27k1 + 272 k2 = ((27 ∗ k2 ) + k1 ) ∗ 27 + k0


are increased
Horner’s rule)
k0 + 27k1 + 272 k2 = ((27 ∗ k2 ) + k1 ) ∗ 27 + k0
This idea may be extended to more characters

KeySize−1
X
Key [KeySize − i − 1] ∗ 32i
i=0


are increased
Horner’s rule)
k0 + 27k1 + 272 k2 = ((27 ∗ k2 ) + k1 ) ∗ 27 + k0
This idea may be extended to more characters

KeySize−1
X
Key [KeySize − i − 1] ∗ 32i
i=0
The magic number 32 is used here, since multiplication can then be

achieved by left shift!
A better hash function for strings
Index
Hash3 ( const char ∗Key , i n t T a b l e S i z e )
{
unsigned i n t HashVal = 0 ;
w h i l e ( ∗ Key != ’ \0 ’ )
HashVal = ( HashVal << 5 ) + ∗Key++;
r e t u r n HashVal % T a b l e S i z e ;
}

Collision Handling
Separate Chaining — use additional data structures to store the pairs

having the same home index
Open Addressing — use only the array and systematically look for an
alternate index
Linear Probing
Quadratic Probing
Double Hashing
Rehashing to increase or reduce the TableSize
Extendible Hashing

Separate Chaining
Idea here is to use another collection data structure to keep all the
pairs that hash to the same index
A simple linked list will be an ideal choice!

Separate Chaining

Data Structure
typedef s t r u c t P a i r S t r u c t {
KeyType k e y ;
ValType v a l u e ;
} ∗ Pair ;

Data Structure
typedef s t r u c t P a i r S t r u c t {
KeyType k e y ;
ValType v a l u e ;
} ∗ Pair ;
struct ListNode
{
Pair Element ;
s t r u c t L i s t N o d e ∗ Next ;
};
typedef s t r u c t ListNode ∗ P o s i t i o n ;
typedef P o s i t i o n L i s t ;

Data Structure
s t r u c t HashTbl
{
int TableSize ;
List ∗ TheLists ;
};
t y p e d e f s t r u c t HashTbl ∗ HashTable ;

Interface
// C o n s t r u c t o r
HashTable I n i t i a l i z e T a b l e ( i n t T a b l e S i z e ) ;
// D e s t r u c t o r
v o i d D e s t r o y T a b l e ( HashTable H ) ;
// I n s e r t a key−v a l u e p a i r
v o i d I n s e r t ( P a i r p a i r , HashTable H ) ;
// F i n d and R e t r i e v e
P o s i t i o n F i n d ( KeyType Key , HashTable H ) ;
Pair Retrieve ( Position P );
P a i r F i n d R e t r i e v e ( KeyType Key , HashTable H ) ;

Implementation: Constructor
HashTable
I n i t i a l i z e T a b l e ( int TableSize )
{
HashTable H ;
int i ;
/∗ 1 ∗/ i f ( TableSize < MinTableSize )

{
/∗ 2 ∗/ E r r o r ( " Table ␣ s i z e ␣ too ␣ s m a l l " ) ;
/∗ 3 ∗/ r e t u r n NULL ;
}

/∗ A l l o c a t e t a b l e ∗/
/∗ 4 ∗/ H = m a l l o c ( s i z e o f ( s t r u c t HashTbl ) ) ;
/∗ 5 ∗/ i f ( H == NULL )
/∗ 6 ∗/ F a t a l E r r o r ( " Out␣ o f ␣ s p a c e ! ! ! " ) ;
/∗ 7 ∗/ H−>T a b l e S i z e = NextPrime ( T a b l e S i z e ) ;
/∗ A l l o c a t e a r r a y o f l i s t s ∗/
/∗ 8 ∗/ H−>T h e L i s t s = m a l l o c ( s i z e o f ( L i s t )
∗ H−>T a b l e S i z e ) ;
/∗ 9 ∗/ i f ( H−>T h e L i s t s == NULL )

/∗ A l l o c a t e l i s t h e a d e r s ∗/
/∗ 11 ∗/ f o r ( i = 0 ; i < H−>T a b l e S i z e ; i++ )
{
/∗ 12 ∗/ H−>T h e L i s t s [ i ] =
malloc ( sizeof ( struct ListNode )
/∗ 13 ∗/ i f ( H−>T h e L i s t s [ i ] == NULL )
else
/∗ 15 ∗/ H−>T h e L i s t s [ i ]−>Next = NULL ;
}
/∗ 16 ∗/ return H;
}

Implementation: Find
Position
F i n d ( KeyType Key , HashTable H )
{
Position P;
List L;
/∗ 1 ∗/ L = H−>T h e L i s t s [ Hash ( Key , H−>T a b l e S i z e ) ] ;

/∗ 2 ∗/ P = L−>Next ;
/∗ 3 ∗/ w h i l e (P != NULL && P−>Element−>k e y != Key )
/∗ 4 ∗/ P = P−>Next ;
/∗ 5 ∗/ return P;
}

Implementation: Retrieve
Pair
Retrieve ( Position P )
{
r e t u r n P−>E l e m e n t ;
}

Implementation: Find and Retrieve
Pair
F i n d R e t r i e v e ( KeyType Key , HashTable H )
{
Position P;
List L;

/∗ 2 ∗/ P = L−>Next ;
/∗ 3 ∗/ w h i l e (P != NULL && P−>Element−>k e y != Key )
/∗ 4 ∗/ P = P−>Next ;
/∗ 5 ∗/ i f ( P == NULL) r e t u r n NULL ;
/∗ 6 ∗/ e l s e r e t u r n P−>E l e m e n t ;
}

Implementation: Insert (key, value) pair
void
I n s e r t ( P a i r p a i r , HashTable H )
{
P o s i t i o n Pos , N e w C e l l ;
List L;
KeyType Key = p a i r −>k e y ;
/∗ 1 ∗/ Pos = F i n d ( Key , H ) ;

Implementation: Insert (key, value) pair
/∗ 2 ∗/ i f ( Pos == NULL ) /∗ Key i s n o t f o u n d ∗/

{
/∗ 3 ∗/ Ne wCe ll = m a l l o c ( s i z e o f ( s t r u c t L i s t N o d e ) ) ;
/∗ 4 ∗/ i f ( Ne wC e l l == NULL )
else
{
/∗ 7 ∗/ NewCell−>Next = L−>Next ;
/∗ 8 ∗/ NewCell−>E l e m e n t = p a i r ;
/∗ 9 ∗/ L−>Next = N e w C e l l ;
}
}
}

Implementation: Destructor
void
D e s t r o y T a b l e ( HashTable H )
{
f o r ( i n t i = 0 ; i < H−>T a b l e S i z e ; i++ )
{
P o s i t i o n P = H−>T h e L i s t s [ i ] ;
P o s i t i o n Tmp ;
w h i l e ( P != NULL )
{
Tmp = P−>Next ;
free ( P );
P = Tmp ;
}
}
f r e e ( H−>T h e L i s t s ) ;
free ( H );
}
Complexity
Is it possible to achieve constant average time with this design?

Complexity

What will be the average length of a linked list?

Complexity

Load factor λ: Ratio of number of pairs to the TableSize

Complexity

Average length of a linked list will be λ

Complexity

We may achieve constant average time, if we can keep the load factor
λ close to 1 and the hash function distributes the keys well

Complexity

We may achieve constant average time, if we can keep the load factor
λ close to 1 and the hash function distributes the keys well
Some other data structure, such as BST or another hash table, may
be tried instead of linked lists, but may not be worth the effort

Summary
Several applications need collections that support only insertion,

deletion, search
Hashing is an ideal solution that can achieve constant average time
We have discussed some simple hash functions and the issues involved
Collision is a major issue in implementing hashing technique
Separate chaining is one of the solutions to handle collision — use a
secondary data structure, such as linked lists, to store all the objects
hashing to the same index
Load factor λ needs to be close to 1 for effective separate chaining

Separate Chaining Hash Table

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Separate Chaining Hash Table

Загружено:

Авторское право:

Доступные форматы

Data Structures

Hashing — Separate Chaining

Professor of Computer Science

April 24, 2019

C. Aravindan (SSN) Data Structures April 24, 2019 1 / 28

C. Aravindan (SSN) Data Structures April 24, 2019 2 / 28

C. Aravindan (SSN) Data Structures April 24, 2019 2 / 28

C. Aravindan (SSN) Data Structures April 24, 2019 2 / 28

C. Aravindan (SSN) Data Structures April 24, 2019 2 / 28

There are several applications that require a “look-up” table — (key,

C. Aravindan (SSN) Data Structures April 24, 2019 3 / 28

There are several applications that require a “look-up” table — (key,

C. Aravindan (SSN) Data Structures April 24, 2019 3 / 28

There are several applications that require a “look-up” table — (key,

C. Aravindan (SSN) Data Structures April 24, 2019 3 / 28

There are several applications that require a “look-up” table — (key,

C. Aravindan (SSN) Data Structures April 24, 2019 3 / 28

There are several applications that require a “look-up” table — (key,

C. Aravindan (SSN) Data Structures April 24, 2019 3 / 28

There are several applications that require a “look-up” table — (key,

C. Aravindan (SSN) Data Structures April 24, 2019 3 / 28

C. Aravindan (SSN) Data Structures April 24, 2019 4 / 28

Selection of hashing function is very crucial to achieve constant time

C. Aravindan (SSN) Data Structures April 24, 2019 5 / 28

If the key is an integer, then a simple hash function (key mod

C. Aravindan (SSN) Data Structures April 24, 2019 6 / 28

C. Aravindan (SSN) Data Structures April 24, 2019 7 / 28

C. Aravindan (SSN) Data Structures April 24, 2019 7 / 28

C. Aravindan (SSN) Data Structures April 24, 2019 7 / 28

C. Aravindan (SSN) Data Structures April 24, 2019 8 / 28

C. Aravindan (SSN) Data Structures April 24, 2019 8 / 28

C. Aravindan (SSN) Data Structures April 24, 2019 8 / 28

We may do the following to generate a larger integer

key [0] + 27 ∗ key [1] + 272 ∗ key [2]

Why only 3 characters? 263 = 17576 combinations!

C. Aravindan (SSN) Data Structures April 24, 2019 9 / 28

We may do the following to generate a larger integer

key [0] + 27 ∗ key [1] + 272 ∗ key [2]

Why only 3 characters? 263 = 17576 combinations!

C. Aravindan (SSN) Data Structures April 24, 2019 9 / 28

But, in reality there may be less than 3000 combinations!!!

C. Aravindan (SSN) Data Structures April 24, 2019 10 / 28

But, in reality there may be less than 3000 combinations!!!

k0 + 27k1 + 272 k2 = ((27 ∗ k2 ) + k1 ) ∗ 27 + k0

C. Aravindan (SSN) Data Structures April 24, 2019 10 / 28

But, in reality there may be less than 3000 combinations!!!

k0 + 27k1 + 272 k2 = ((27 ∗ k2 ) + k1 ) ∗ 27 + k0

This idea may be extended to more characters

C. Aravindan (SSN) Data Structures April 24, 2019 10 / 28

But, in reality there may be less than 3000 combinations!!!

k0 + 27k1 + 272 k2 = ((27 ∗ k2 ) + k1 ) ∗ 27 + k0

This idea may be extended to more characters

The magic number 32 is used here, since multiplication can then be

C. Aravindan (SSN) Data Structures April 24, 2019 11 / 28

Separate Chaining — use additional data structures to store the pairs

C. Aravindan (SSN) Data Structures April 24, 2019 12 / 28

C. Aravindan (SSN) Data Structures April 24, 2019 13 / 28

C. Aravindan (SSN) Data Structures April 24, 2019 14 / 28

C. Aravindan (SSN) Data Structures April 24, 2019 15 / 28

C. Aravindan (SSN) Data Structures April 24, 2019 15 / 28

C. Aravindan (SSN) Data Structures April 24, 2019 16 / 28

C. Aravindan (SSN) Data Structures April 24, 2019 17 / 28

/∗ 1 ∗/ i f ( TableSize < MinTableSize )