Вы находитесь на странице: 1из 56

HASHING

CSCI 203
Hash tables

1. Hash Tables
2. Direct-address tables
3. Hash tables

CSI 203 UoW Dubai H.M.


11/25/2018 Khelalfa 2
Hash tables
 Many applications require:
 Dynamic set ( a set than can grow and
shrink).
 Dictionary operations:
• INSERT
• SEARCH
• DELETE

CSI 203 UoW Dubai H.M.


11/25/2018 Khelalfa 3
Hash table
 Effective for implementing dictionaries
 Searching for an element in a hash table
can take as long as searching for an
element in a linked list ---Θ(n) worst case
 But , with reasonable assumptions,
expected time to search in a hash table
can be reduced to O(1)

CSI 203 UoW Dubai H.M.


11/25/2018 Khelalfa 4
Direct-address tables
 Simple technique
 Works well when universe U of Keys is
reasonably small

CSI 203 UoW Dubai H.M.


11/25/2018 Khelalfa 5
Direct-address tables
 Assume application needs a dynamic set
(dictionary) made of records.
 For example each student record may contain
several fields of attributes such as:
 Student ID number
 Name,
 Date of birth,
 Home address,
 Email
 Phone number

CSI 203 UoW Dubai H.M.


11/25/2018 Khelalfa 6
Direct-address tables
 Each Record has a key drawn from the Universe U
 U = {0, 1,2, ….., m-1}
 Think of the key as a student ID for example.
 Assume that m is not too large
 This dictionary (dynamic set) can be represented
by an array or direct-addressable table
 Where each position or slot corresponds to a key

CSI 203 UoW Dubai H.M.


11/25/2018 Khelalfa 7
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

The key is the student id, the satellite


data= name, date of birth, sex, email, etc.

CSI 203 UoW Dubai H.M.


11/25/2018 Khelalfa 8
Direct-address tables
 This Simple technique becomes a problem when U is
large:
1. Storing a table of size U  can be impractical,
Even impossible (memory requirement).
2. When the set K of keys stored in the dictionary
is much smaller that the set U of all possible
keys, we will waste a lot of memory space with
direct addressing.
 Solution is hashing

CSI 203 UoW Dubai H.M.


11/25/2018 Khelalfa 9
Direct addressing vs. hashing
Direct addressing Hashing
 Element with key k  element with key k
is stored in slot k is stored in slot h(k)

1. Hash function h computes the slot


from the key k
2. h maps the universe U of keys into the
slots of a hash table T[0..m-1]

h : U 0,1,, m 1 
CSI 203 UoW Dubai H.M.
11/25/2018 Khelalfa 10
Hash table: basic idea
If set of keys K stored in a dictionary is
much smaller than the universe U of
all possible keys

A hash table requires much less


storage than direct-table addressing

CSI 203 UoW Dubai H.M.


11/25/2018 Khelalfa 11
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

CSI 203 UoW Dubai H.M.


11/25/2018 Khelalfa 12
Hash function requirements (1)
 Hash table size:
1. should not be excessively large
compared to the number of keys
2. But should be sufficiently large to not
jeopardize the efficiency of the
implementation time

CSI 203 UoW Dubai H.M.


11/25/2018 Khelalfa 13
Hash function requirements (2)
 Hash function:
1. Needs to distribute the keys amongst
the cells of the table as uniformly as
possible
2. The function must be easy to
compute

CSI 203 UoW Dubai H.M.


11/25/2018 Khelalfa 14
Hash table
 Hash objective: instead of handling
U values , we need only handle
m values.
 Storage requirements are therefore
reduced
 BUT ….

CSI 203 UoW Dubai H.M.


11/25/2018 Khelalfa 15
Hash table
 There is a
potential problem
 …. Collision
 If two keys may
hash to the same
slot.

CSI 203 UoW Dubai H.M.


11/25/2018 Khelalfa 16
Hash table
 Collision resolution:
 Chaining
 Open hashing-separate hashing

 Closed hashing

CSI 203 UoW Dubai H.M.


11/25/2018 Khelalfa 17
Hash table- chaining
 All elements that hash to the same slot are
put in a linked list

CSI 203 UoW Dubai H.M.


11/25/2018 Khelalfa 18
How well does hashing with chaining
perform?

 Assume hash table with m slots


that stores n elements
 Worst case is terrible
1. All n keys hash to same slot
2. Creating a list of length n
3. Worst case search is (n) + time to
compute the function
CSI 203 UoW Dubai H.M.
11/25/2018 Khelalfa 19
Load factor

The Load Factor of Table T is :


  nm
That is the average number of elements
stored in a chain

CSI 203 UoW Dubai H.M.


11/25/2018 Khelalfa 20
Open chaining
 Keys are stored in linked lists
attached to cells of a hash table
 Each list contains all the keys attached
to its cells

CSI 203 UoW Dubai H.M.


11/25/2018 Khelalfa 21
Open chaining example
 Listof objects:
 A, FOOL, AND, HIS, MONEY, ARE,
SOON, PARTED
 Example of h function:
 add the positions of a world’s
letter in the alphabet
 Compute the sum’s remainder
after division by 13
CSI 203 UoW Dubai H.M.
11/25/2018 Khelalfa 22
h(A)=1 mod 13 =1

Open chaining example


keys A

Hash
address

0 1 2 3 4 5 6 7 8 9 10 11 12

CSI 203 UoW Dubai H.M.


11/25/2018 Khelalfa 23
h(A)=1 mod 13=1

Open chaining example


keys A

Hash 1
address

0 1 2 3 4 5 6 7 8 9 10 11 12

CSI 203 UoW Dubai H.M.


11/25/2018 Khelalfa 24
H(fool)= (6+15+15++12)mod13= 9

Open chaining example


keys A FOOL

Hash 1
address

0 1 2 3 4 5 6 7 8 9 10 11 12

CSI 203 UoW Dubai H.M.


11/25/2018 Khelalfa 25
Open chaining example
keys A FOOL

Hash 1 9
address

0 1 2 3 4 5 6 7 8 9 10 11 12

A fool

CSI 203 UoW Dubai H.M.


11/25/2018 Khelalfa 26
Open chaining example
keys A FOOL AND HIS MONEY ARE SOON PARTED

Hash 1 9 6 10 7 11 11 12
address

0 1 2 3 4 5 6 7 8 9 10 11 12

A and fool parted

money are soon


CSI 203 UoW
his
Dubai H.M.
11/25/2018 Khelalfa 27
How do we perform a search?

 We just apply to the search key the


same procedure used in creating the
table.
 Example: search for the key KID in the
hash table
 Compute h(KID)= 11
 Look up the list attached to cell 11

CSI 203 UoW Dubai H.M.


11/25/2018 Khelalfa 28
Open chaining example
keys A FOOL AND HIS MONEY ARE SOON PARTED

Hash 1 9 6 10 7 11 11 12
address

h(Kid)=11 We must traverse the


linked list attached to cell 11
0 1 2 3 4 5 6 7 8 9 10 11 12

A and fool parted

money are soon


CSI 203 UoW
his
Dubai H.M.
11/25/2018 Khelalfa 29
Open chaining example
n
 Let  
m
 Load factor
 Distribution of n key among m cells
 S= average number of pointers
inspected in successful search
 U= average number of pointers
inspected in unsuccessful search

CSI 203 UoW Dubai H.M.


11/25/2018 Khelalfa 30
Open chaining example


S 1
2
U 
CSI 203 UoW Dubai H.M.
11/25/2018 Khelalfa 31
Open chaining example
 Load factor should be close to 1.
 What If too small ?
 Lots of empty list- inefficient use of
space
 What if too large?
 Longer linked lists– longer search
times
CSI 203 UoW Dubai H.M.
11/25/2018 Khelalfa 32
Closed hashing
 All keys are stored in the hash table
itself
 Without using linked lists
 What does this imply for the table size
m, given we have n keys?
 Table size m must be at least equal to n

CSI 203 UoW Dubai H.M.


11/25/2018 Khelalfa 33
Closed hashing
 What happen
if there is a
collision?

CSI 203 UoW Dubai H.M.


11/25/2018 Khelalfa 34
Closed hashing – linear probing as
solution to collisions

Check the cell following the one


where the collusion occurs

N
IF cell is empty
Y
The new key is The availability of
stored there the cell’s immediate
successor is checked
and so on, ..
CSI 203 UoW Dubai H.M.
11/25/2018 Khelalfa 35
Exercise
Linear probing
keys A FOO AND HIS MONEY ARE SOON PARTED
L
Hash 1 9 6 10 7 11 11 12
address

0 1 2 3 4 5 6 7 8 9 10 11 12

CSI 203 UoW Dubai H.M.


11/25/2018 Khelalfa 37
keys A FOO AND HIS MONEY ARE SOON PARTED
L
Hash 1 9 6 10 7 11 11 12
address

0 1 2 3 4 5 6 7 8 9 10 11 12

CSI 203 UoW Dubai H.M.


11/25/2018 Khelalfa 38
keys A FOO AND HIS MONEY ARE SOON PARTED
L
Hash 1 9 6 10 7 11 11 12
address

0 1 2 3 4 5 6 7 8 9 10 11 12

A fool

CSI 203 UoW Dubai H.M.


11/25/2018 Khelalfa 39
keys A FOO AND HIS MONEY ARE SOON PARTED
L
Hash 1 9 6 10 7 11 11 12
address

0 1 2 3 4 5 6 7 8 9 10 11 12

A fool

A and fool

A and fool his

A and money fool his

A and money fool his are

A and money fool his are soon

parted A and money fool his are soon

CSI 203 UoW Dubai H.M.


11/25/2018 Khelalfa 40
Search for KID, h(KID)=11
We compare kid with ARE, SOON, PARTED, A
At that point we stop – search unsuccessful
0 1 2 3 4 5 6 7 8 9 10 11 12

A fool

A and fool

A and fool his

A and money fool his

A and money fool his are

A and money fool his are soon

parted A and money fool his are soon

CSI 203 UoW Dubai H.M.


11/25/2018 Khelalfa 41
Search for LIT, h(LIT)=2
Cell 2 is empty, we stop immediately
search unsuccessful
0 1 2 3 4 5 6 7 8 9 10 11 12

A fool

A and fool

A and fool his

A and money fool his

A and money fool his are

A and money fool his are soon

parted A and money fool his are soon

CSI 203 UoW Dubai H.M.


11/25/2018 Khelalfa 42
Assume we delete the Key are

0 1 2 3 4 5 6 7 8 9 10 11 12

A fool

A and fool

A and fool his

A and money fool his

A and money fool his are

A and money fool his are soon

parted A and money fool his are soon

CSI 203 UoW Dubai H.M.


11/25/2018 Khelalfa 43
Assume we delete the Key are

0 1 2 3 4 5 6 7 8 9 10 11 12

A fool

A and fool

A and fool his

A and money fool his

A and money fool his are

A and money fool his are soon

parted A and money fool his soon

CSI 203 UoW Dubai H.M.


11/25/2018 Khelalfa 44
Assume that after deleting are, we search
for the key soon – h(soon)=11

0 1 2 3 4 5 6 7 8 9 10 11 12

A fool

A and fool

A and fool his

A and money fool his

A and money fool his are

A and money fool his are soon

parted A and money fool his soon

CSI 203 UoW Dubai H.M.


11/25/2018 Khelalfa 45
Assume that after deleting are, we search for
the key soon – h(soon)=11??????
Cell 11 is empty!!!!!!! Unsuccessful search

0 1 2 3 4 5 6 7 8 9 10 11 12

A fool

A and fool

A and fool his

A and money fool his

A and money fool his are

A and money fool his are soon

parted A and money fool his soon

CSI 203 UoW Dubai H.M.


11/25/2018 Khelalfa 46
Simple solution
 Lazy detection
 Mark previously occupied cells by a
special symbol to distinguish them
from cells that have never been
occupied.

CSI 203 UoW Dubai H.M.


11/25/2018 Khelalfa 47
Linear probing complexity

1 1 
S  1  
2 1 
1 1 
U  1  

2 1   2

CSI 203 UoW Dubai H.M.
11/25/2018 Khelalfa 48
Linear probing
α S U
50% 1.5 2.5

75% 2.5 8.5

90% 5.5 50.5

CSI 203 UoW Dubai H.M.


11/25/2018 Khelalfa 49
CSI 203 UoW Dubai H.M.
11/25/2018 Khelalfa 50
CSI 203 UoW Dubai H.M.
11/25/2018 Khelalfa 51
Linear probing- clustering
 When table gets closer to being full
 Performance degrades
 Clustering phenomena
 Here , cluster means : sequence of
continuously occupied cells, with
possible wrapping
 Make dictionaries inefficient

CSI 203 UoW Dubai H.M.


11/25/2018 Khelalfa 52
Linear probing- clustering
 As clusters become larger
 The Probability of a new element to
be attached to a cluster increases

CSI 203 UoW Dubai H.M.


11/25/2018 Khelalfa 53
One solution: Double hashing

 Another hash function s(K) is used


 It determines a fixed increment for the
probing sequence, to be used after a
collision at location l= h(k)
 (l+s(K)) mod m,
 (l+2s(K)) mod m,
 ……….

CSI 203 UoW Dubai H.M.


11/25/2018 Khelalfa 54
Double hashing
 To guarantee that every loaction in the table is
probed by the sequence (a)
 The increment s(k) and the table size must be
relatively prime - their gcd =1
 Some recommend that
 S(k) = (m-2 –k ) mod (m-2) Or S(k)= 8 –(k mod8) for
small tables
 S(k)= k (mod 97) + 1 for large tables

l  s  K  mod m, l  2s  K  mod m, , (a )
CSI 203 UoW Dubai H.M.
11/25/2018 Khelalfa 55
Exercise- open hashing
 For the input 30, 30, 36, 75, 31, 19,
 Hash function h(K)= K mod 11
 Construct the open hash table
 Find the largest number of key comparisons
in a successful search in this table
 Find the average number of key
comparisons in a successful search in this
table

CSI 203 UoW Dubai H.M.


11/25/2018 Khelalfa 56