Вы находитесь на странице: 1из 50

Data Structures

Hashing — Separate Chaining

C. Aravindan
<AravindanC@ssn.edu.in>

Professor of Computer Science


SSN College of Engineering, Chennai

April 24, 2019

C. Aravindan (SSN) Data Structures April 24, 2019 1 / 28


Outline

1 Introduction

C. Aravindan (SSN) Data Structures April 24, 2019 2 / 28


Outline

1 Introduction

2 Hash Functions

C. Aravindan (SSN) Data Structures April 24, 2019 2 / 28


Outline

1 Introduction

2 Hash Functions

3 Separate Chaining

C. Aravindan (SSN) Data Structures April 24, 2019 2 / 28


Outline

1 Introduction

2 Hash Functions

3 Separate Chaining

4 Summary

C. Aravindan (SSN) Data Structures April 24, 2019 2 / 28


Introduction

There are several applications that require a “look-up” table — (key,


value) pairs to be stored
Insertion, search, and deletion based on keys

C. Aravindan (SSN) Data Structures April 24, 2019 3 / 28


Introduction

There are several applications that require a “look-up” table — (key,


value) pairs to be stored
Insertion, search, and deletion based on keys
FindMin, FindMax, sorting, traversal may not be needed

C. Aravindan (SSN) Data Structures April 24, 2019 3 / 28


Introduction

There are several applications that require a “look-up” table — (key,


value) pairs to be stored
Insertion, search, and deletion based on keys
FindMin, FindMax, sorting, traversal may not be needed
Examples: dictionary lookup, symbol table, database indexing

C. Aravindan (SSN) Data Structures April 24, 2019 3 / 28


Introduction

There are several applications that require a “look-up” table — (key,


value) pairs to be stored
Insertion, search, and deletion based on keys
FindMin, FindMax, sorting, traversal may not be needed
Examples: dictionary lookup, symbol table, database indexing
We may use Binary Search Tree with balancing

C. Aravindan (SSN) Data Structures April 24, 2019 3 / 28


Introduction

There are several applications that require a “look-up” table — (key,


value) pairs to be stored
Insertion, search, and deletion based on keys
FindMin, FindMax, sorting, traversal may not be needed
Examples: dictionary lookup, symbol table, database indexing
We may use Binary Search Tree with balancing
Complexity: O(log n)

C. Aravindan (SSN) Data Structures April 24, 2019 3 / 28


Introduction

There are several applications that require a “look-up” table — (key,


value) pairs to be stored
Insertion, search, and deletion based on keys
FindMin, FindMax, sorting, traversal may not be needed
Examples: dictionary lookup, symbol table, database indexing
We may use Binary Search Tree with balancing
Complexity: O(log n)
Is it possible to do this in constant time (on the average)!?

C. Aravindan (SSN) Data Structures April 24, 2019 3 / 28


Basic Idea

C. Aravindan (SSN) Data Structures April 24, 2019 4 / 28


Issues

Selection of hashing function is very crucial to achieve constant time


on the average
Hashing function should ideally handle any type of key (non-mutable)
It should be easy to compute!
Hashing function should ideally generate distinct indexes for distinct
keys
Generally, the “key density” is very high compared to the number of
pairs we wish to store
How do we choose appropriate TableSize?
What to do when the hashing function maps more than one key to an
index (collision handling)?

C. Aravindan (SSN) Data Structures April 24, 2019 5 / 28


Simple Hash Function

If the key is an integer, then a simple hash function (key mod


TableSize) may work
A simple analysis reveals that the keys are well distributed across the
index space when TableSize is a prime number

Index
Hash ( KeyType Key , i n t T a b l e S i z e )
{
r e t u r n Key % T a b l e S i z e ;
}

C. Aravindan (SSN) Data Structures April 24, 2019 6 / 28


How do we hash strings?

For categorical data, such as strings, we may first convert the key to
an integer

C. Aravindan (SSN) Data Structures April 24, 2019 7 / 28


How do we hash strings?

For categorical data, such as strings, we may first convert the key to
an integer
A simple idea for strings will be to just add up the ASCII values of the
characters

C. Aravindan (SSN) Data Structures April 24, 2019 7 / 28


How do we hash strings?

For categorical data, such as strings, we may first convert the key to
an integer
A simple idea for strings will be to just add up the ASCII values of the
characters

Index
Hash1 ( const char ∗ Key , i n t T a b l e S i z e )
{
unsigned i n t HashVal = 0 ;

w h i l e ( ∗ Key != ’ \0 ’ )
HashVal += ∗Key++;

r e t u r n HashVal % T a b l e S i z e ;
}

C. Aravindan (SSN) Data Structures April 24, 2019 7 / 28


Hashing strings

This idea may be bad when TableSize is large and length of string is
short

C. Aravindan (SSN) Data Structures April 24, 2019 8 / 28


Hashing strings

This idea may be bad when TableSize is large and length of string is
short
Example: TableSize = 10007 and strings are 8 characters at most

C. Aravindan (SSN) Data Structures April 24, 2019 8 / 28


Hashing strings

This idea may be bad when TableSize is large and length of string is
short
Example: TableSize = 10007 and strings are 8 characters at most
Only indexes up to (127 * 8 = 1016) are used!

C. Aravindan (SSN) Data Structures April 24, 2019 8 / 28


Another hash function for strings

We may do the following to generate a larger integer

key [0] + 27 ∗ key [1] + 272 ∗ key [2]

Why only 3 characters? 263 = 17576 combinations!

C. Aravindan (SSN) Data Structures April 24, 2019 9 / 28


Another hash function for strings

We may do the following to generate a larger integer

key [0] + 27 ∗ key [1] + 272 ∗ key [2]

Why only 3 characters? 263 = 17576 combinations!

Index
Hash2 ( const char ∗ Key , i n t T a b l e S i z e )
{
r e t u r n ( Key [ 0 ] + 27 ∗ Key [ 1 ] + 729 ∗ Key [ 2 ] )
% TableSize ;
}

C. Aravindan (SSN) Data Structures April 24, 2019 9 / 28


Simplifying the calculations

But, in reality there may be less than 3000 combinations!!!


And the computation gets complex when the number of characters
are increased

C. Aravindan (SSN) Data Structures April 24, 2019 10 / 28


Simplifying the calculations

But, in reality there may be less than 3000 combinations!!!


And the computation gets complex when the number of characters
are increased
The above computation may be simplified as follows (referred to as
Horner’s rule)

k0 + 27k1 + 272 k2 = ((27 ∗ k2 ) + k1 ) ∗ 27 + k0

C. Aravindan (SSN) Data Structures April 24, 2019 10 / 28


Simplifying the calculations

But, in reality there may be less than 3000 combinations!!!


And the computation gets complex when the number of characters
are increased
The above computation may be simplified as follows (referred to as
Horner’s rule)

k0 + 27k1 + 272 k2 = ((27 ∗ k2 ) + k1 ) ∗ 27 + k0

This idea may be extended to more characters


KeySize−1
X
Key [KeySize − i − 1] ∗ 32i
i=0

C. Aravindan (SSN) Data Structures April 24, 2019 10 / 28


Simplifying the calculations

But, in reality there may be less than 3000 combinations!!!


And the computation gets complex when the number of characters
are increased
The above computation may be simplified as follows (referred to as
Horner’s rule)

k0 + 27k1 + 272 k2 = ((27 ∗ k2 ) + k1 ) ∗ 27 + k0

This idea may be extended to more characters


KeySize−1
X
Key [KeySize − i − 1] ∗ 32i
i=0

The magic number 32 is used here, since multiplication can then be


achieved by left shift!
C. Aravindan (SSN) Data Structures April 24, 2019 10 / 28
A better hash function for strings

Index
Hash3 ( const char ∗Key , i n t T a b l e S i z e )
{
unsigned i n t HashVal = 0 ;

w h i l e ( ∗ Key != ’ \0 ’ )
HashVal = ( HashVal << 5 ) + ∗Key++;

r e t u r n HashVal % T a b l e S i z e ;
}

C. Aravindan (SSN) Data Structures April 24, 2019 11 / 28


Collision Handling

Separate Chaining — use additional data structures to store the pairs


having the same home index
Open Addressing — use only the array and systematically look for an
alternate index
Linear Probing
Quadratic Probing
Double Hashing
Rehashing to increase or reduce the TableSize
Extendible Hashing

C. Aravindan (SSN) Data Structures April 24, 2019 12 / 28


Separate Chaining

Idea here is to use another collection data structure to keep all the
pairs that hash to the same index
A simple linked list will be an ideal choice!

C. Aravindan (SSN) Data Structures April 24, 2019 13 / 28


Separate Chaining

C. Aravindan (SSN) Data Structures April 24, 2019 14 / 28


Data Structure

typedef s t r u c t P a i r S t r u c t {
KeyType k e y ;
ValType v a l u e ;
} ∗ Pair ;

C. Aravindan (SSN) Data Structures April 24, 2019 15 / 28


Data Structure

typedef s t r u c t P a i r S t r u c t {
KeyType k e y ;
ValType v a l u e ;
} ∗ Pair ;

struct ListNode
{
Pair Element ;
s t r u c t L i s t N o d e ∗ Next ;
};

typedef s t r u c t ListNode ∗ P o s i t i o n ;
typedef P o s i t i o n L i s t ;

C. Aravindan (SSN) Data Structures April 24, 2019 15 / 28


Data Structure

s t r u c t HashTbl
{
int TableSize ;
List ∗ TheLists ;
};

t y p e d e f s t r u c t HashTbl ∗ HashTable ;

C. Aravindan (SSN) Data Structures April 24, 2019 16 / 28


Interface

// C o n s t r u c t o r
HashTable I n i t i a l i z e T a b l e ( i n t T a b l e S i z e ) ;

// D e s t r u c t o r
v o i d D e s t r o y T a b l e ( HashTable H ) ;

// I n s e r t a key−v a l u e p a i r
v o i d I n s e r t ( P a i r p a i r , HashTable H ) ;

// F i n d and R e t r i e v e
P o s i t i o n F i n d ( KeyType Key , HashTable H ) ;
Pair Retrieve ( Position P );
P a i r F i n d R e t r i e v e ( KeyType Key , HashTable H ) ;

C. Aravindan (SSN) Data Structures April 24, 2019 17 / 28


Implementation: Constructor

HashTable
I n i t i a l i z e T a b l e ( int TableSize )
{
HashTable H ;
int i ;

/∗ 1 ∗/ i f ( TableSize < MinTableSize )


{
/∗ 2 ∗/ E r r o r ( " Table ␣ s i z e ␣ too ␣ s m a l l " ) ;
/∗ 3 ∗/ r e t u r n NULL ;
}

C. Aravindan (SSN) Data Structures April 24, 2019 18 / 28


Implementation: Constructor

/∗ A l l o c a t e t a b l e ∗/
/∗ 4 ∗/ H = m a l l o c ( s i z e o f ( s t r u c t HashTbl ) ) ;
/∗ 5 ∗/ i f ( H == NULL )
/∗ 6 ∗/ F a t a l E r r o r ( " Out␣ o f ␣ s p a c e ! ! ! " ) ;

/∗ 7 ∗/ H−>T a b l e S i z e = NextPrime ( T a b l e S i z e ) ;

/∗ A l l o c a t e a r r a y o f l i s t s ∗/
/∗ 8 ∗/ H−>T h e L i s t s = m a l l o c ( s i z e o f ( L i s t )
∗ H−>T a b l e S i z e ) ;
/∗ 9 ∗/ i f ( H−>T h e L i s t s == NULL )
/∗ 10 ∗/ F a t a l E r r o r ( " Out␣ o f ␣ s p a c e ! ! ! " ) ;

C. Aravindan (SSN) Data Structures April 24, 2019 19 / 28


Implementation: Constructor

/∗ A l l o c a t e l i s t h e a d e r s ∗/
/∗ 11 ∗/ f o r ( i = 0 ; i < H−>T a b l e S i z e ; i++ )
{
/∗ 12 ∗/ H−>T h e L i s t s [ i ] =
malloc ( sizeof ( struct ListNode )
/∗ 13 ∗/ i f ( H−>T h e L i s t s [ i ] == NULL )
/∗ 14 ∗/ F a t a l E r r o r ( " Out␣ o f ␣ s p a c e ! ! ! " ) ;
else
/∗ 15 ∗/ H−>T h e L i s t s [ i ]−>Next = NULL ;
}

/∗ 16 ∗/ return H;
}

C. Aravindan (SSN) Data Structures April 24, 2019 20 / 28


Implementation: Find

Position
F i n d ( KeyType Key , HashTable H )
{
Position P;
List L;

/∗ 1 ∗/ L = H−>T h e L i s t s [ Hash ( Key , H−>T a b l e S i z e ) ] ;


/∗ 2 ∗/ P = L−>Next ;
/∗ 3 ∗/ w h i l e (P != NULL && P−>Element−>k e y != Key )
/∗ 4 ∗/ P = P−>Next ;
/∗ 5 ∗/ return P;
}

C. Aravindan (SSN) Data Structures April 24, 2019 21 / 28


Implementation: Retrieve

Pair
Retrieve ( Position P )
{
r e t u r n P−>E l e m e n t ;
}

C. Aravindan (SSN) Data Structures April 24, 2019 22 / 28


Implementation: Find and Retrieve

Pair
F i n d R e t r i e v e ( KeyType Key , HashTable H )
{
Position P;
List L;

/∗ 1 ∗/ L = H−>T h e L i s t s [ Hash ( Key , H−>T a b l e S i z e ) ] ;


/∗ 2 ∗/ P = L−>Next ;
/∗ 3 ∗/ w h i l e (P != NULL && P−>Element−>k e y != Key )
/∗ 4 ∗/ P = P−>Next ;
/∗ 5 ∗/ i f ( P == NULL) r e t u r n NULL ;
/∗ 6 ∗/ e l s e r e t u r n P−>E l e m e n t ;
}

C. Aravindan (SSN) Data Structures April 24, 2019 23 / 28


Implementation: Insert (key, value) pair

void
I n s e r t ( P a i r p a i r , HashTable H )
{
P o s i t i o n Pos , N e w C e l l ;
List L;

KeyType Key = p a i r −>k e y ;

/∗ 1 ∗/ Pos = F i n d ( Key , H ) ;

C. Aravindan (SSN) Data Structures April 24, 2019 24 / 28


Implementation: Insert (key, value) pair

/∗ 2 ∗/ i f ( Pos == NULL ) /∗ Key i s n o t f o u n d ∗/


{
/∗ 3 ∗/ Ne wCe ll = m a l l o c ( s i z e o f ( s t r u c t L i s t N o d e ) ) ;
/∗ 4 ∗/ i f ( Ne wC e l l == NULL )
/∗ 5 ∗/ F a t a l E r r o r ( " Out␣ o f ␣ s p a c e ! ! ! " ) ;
else
{
/∗ 6 ∗/ L = H−>T h e L i s t s [ Hash ( Key , H−>T a b l e S i z e ) ] ;
/∗ 7 ∗/ NewCell−>Next = L−>Next ;
/∗ 8 ∗/ NewCell−>E l e m e n t = p a i r ;
/∗ 9 ∗/ L−>Next = N e w C e l l ;
}
}
}

C. Aravindan (SSN) Data Structures April 24, 2019 25 / 28


Implementation: Destructor
void
D e s t r o y T a b l e ( HashTable H )
{
f o r ( i n t i = 0 ; i < H−>T a b l e S i z e ; i++ )
{
P o s i t i o n P = H−>T h e L i s t s [ i ] ;
P o s i t i o n Tmp ;
w h i l e ( P != NULL )
{
Tmp = P−>Next ;
free ( P );
P = Tmp ;
}
}
f r e e ( H−>T h e L i s t s ) ;
free ( H );
}
C. Aravindan (SSN) Data Structures April 24, 2019 26 / 28
Complexity

Is it possible to achieve constant average time with this design?

C. Aravindan (SSN) Data Structures April 24, 2019 27 / 28


Complexity

Is it possible to achieve constant average time with this design?


What will be the average length of a linked list?

C. Aravindan (SSN) Data Structures April 24, 2019 27 / 28


Complexity

Is it possible to achieve constant average time with this design?


What will be the average length of a linked list?
Load factor λ: Ratio of number of pairs to the TableSize

C. Aravindan (SSN) Data Structures April 24, 2019 27 / 28


Complexity

Is it possible to achieve constant average time with this design?


What will be the average length of a linked list?
Load factor λ: Ratio of number of pairs to the TableSize
Average length of a linked list will be λ

C. Aravindan (SSN) Data Structures April 24, 2019 27 / 28


Complexity

Is it possible to achieve constant average time with this design?


What will be the average length of a linked list?
Load factor λ: Ratio of number of pairs to the TableSize
Average length of a linked list will be λ
We may achieve constant average time, if we can keep the load factor
λ close to 1 and the hash function distributes the keys well

C. Aravindan (SSN) Data Structures April 24, 2019 27 / 28


Complexity

Is it possible to achieve constant average time with this design?


What will be the average length of a linked list?
Load factor λ: Ratio of number of pairs to the TableSize
Average length of a linked list will be λ
We may achieve constant average time, if we can keep the load factor
λ close to 1 and the hash function distributes the keys well
Some other data structure, such as BST or another hash table, may
be tried instead of linked lists, but may not be worth the effort

C. Aravindan (SSN) Data Structures April 24, 2019 27 / 28


Summary

Several applications need collections that support only insertion,


deletion, search
Hashing is an ideal solution that can achieve constant average time
We have discussed some simple hash functions and the issues involved
Collision is a major issue in implementing hashing technique
Separate chaining is one of the solutions to handle collision — use a
secondary data structure, such as linked lists, to store all the objects
hashing to the same index
Load factor λ needs to be close to 1 for effective separate chaining

C. Aravindan (SSN) Data Structures April 24, 2019 28 / 28

Вам также может понравиться