You are on page 1of 9

# HASHING

Hashing is a technique used for performing insertions, deletions and search operations in
constant average time
Hashing is a key to address transformation technique
HASH TABLE:
Hash table is a data structure used in hashing
The ideal hash table data structure is an array of some fixed size, containing the items
A search is performed based on key
Each key is mapped into some position in the range 0 to TableSize-1
The mapping is called hash function
HASH FUNCTION
Hash function is a key to address transformation which acts upon a given key to compute
the relative position of the key in an array
The mapping of each key into some number ranged from ! to Table"ize#\$ is known as
Hashing
%deally, the hash function is used to determine the location of any record given its key
value
The hash function should have the following properties:
& "imple to compute
& 'ust distribute the data evenly
& (enerates lower number of collisions
Example of a good hash function :
Hash)*ey+,*ey- Table"ize
Routine fo Goo! Ha"# Fun\$tion
Typedef unsigned int index.
%/0E1
hash) char 2key, unsigned int H3"%4E +
5
unsigned int hash3val , 6.
72\$27 while) 2key 8, 9:!9 +
72;27 hash3val , ) hash3val << = + > 2key>>.
72?27 return) hash3val - H3"%4E +.
@
Colli"ion:
Ahen a memory location filled if another value of the same memory location comes then there
collision occurs
Ahen an element is inserted it hashes to the same value as an already inserted element, then it
produces collision and need to be resolved
Colli"ion e"ol%in& 'et#o!":
"eparate chaining )or+ 6pen Hashing
Se(aate \$#ainin& )o* e+tenal #a"#in&:
"eparate chaining is an open hashing technique
"eparate chaining is a collision resolution technique, in which we can keep the list of all
elements that hash to same value This is called as separate chaining because each hash table
element is a separate chain )linked list+
Each link list contains the entire element whose keys hash to the same index
ExampleC
Hash functionC h)*+ , * mod \$!
To insert *eys C 5!,D\$,EF,;=,?E,FG,\$,F,\$E,G@
0oesnHt require prior knowledge of the number of elements that are to be stored in
the hash table )ie,+ 0ynamic allocation is done
The elements having the same memory address will be in the same chain and
The elements are not evenly distributed "ome may have more elements and some
may not have anything
%t requires pointers which requires more space
'emory allocation in linked list manipulation will slow down the program
typedef struct list3node 2node3ptr.
struct list3node
5
element3type element.
node3ptr next.
@.
typedef node3ptr I%"T.
typedef node3ptr position.
72 I%"T 2the3list will be an array of lists, allocated later 27
72 The lists will use headers, allocated later 27
struct hash3tbl
5
unsigned int table3size.
I%"T 2the3lists.
@.
typedef struct hash3tbl 2HA"H3TAJIE.
Routine fo T,(e !e\$laation fo o(en #a"# table
Koid insert) element3type key, HA"H3TAJIE H +
5
position pos, new3cell.
I%"T I.
72\$27 pos , find) key, H +.
72;27 if) pos ,, /LII +
5
72?27 new3cell , )position+ malloc)sizeof)struct list3node++.
72F27 if) new3cell ,, /LII +
72=27 fatal3error)M6ut of space888M+.
else
5
72E27 I , H#Nthe3listsO hash) key, H#Ntable size + P.
72Q27 new3cell#Nnext , I#Nnext.
72D27 new3cell#Nelement , key. 72 Rrobably need strcpy88 27
72G27 I#Nnext , new3cell.
@
@
@
In"et outine fo o(en #a"# table
position find) element3type key, HA"H3TAJIE H +
5
position p.
I%"T I.
72\$27 I , H#Nthe3listsO hash) key, H#Ntable3size+ P.
72;27 p , I#Nnext.
72?27 while) )p 8, /LII+ SS )p#Nelement 8, key+ +
72 Rrobably need strcmp88 27
72F27 p , p#Nnext.
72=27 return p.
@
Fin! outine fo o(en #a"# table
O(en a!!e""in& )o* Clo"e! Ha"#in&
Blosed hashing, also known as open addressing, is an alternative to resolving collisions with
%n a closed hashing system, if a collision occurs, alternate cells are tried until an empty cell is
found-
(eneral format of finding cell is
H
i
).*/)Ha"#).*0F)i**'o! Table1"ize
Three common \$olli"ion e"olution "tate&ie" in open addressing are
\$ Iinear Rrobing
? 0ouble Hashing
Linea 2obin&:
%n linear probing, is U)i+ a linear function of i, typically F)i* / i
This amounts to trying cells sequentially )with wraparound+ in search of an empty cell
inserting keys 5DG, \$D, FG, =D, EG@ will result as
A!%anta&e" of linea (obin&:
\$ %t does not require pointers
; %t is very simpler to implement
3i"a!%anta&e" of linea (obin&:
\$ %t forms clusters, which degrades the performance of the hash table for sorting and
retrieving data
; %f any collision occur when the hash table becomes half full, it is difficult to find an empty
location in the hash table and hence the insertion process takes a longer time This is called
2i'a, \$lu"tein& (oble'
4- 5ua!ati\$ 2obin&
Tuadratic probing is a collision resolution method that eliminates the primary
clustering problem of linear probing
4
inserting keys 5DG, \$D, FG, =D, EG@ will result as
A!%anta&e" of 6ua!ati\$ (obin&
Easy to implement
Iess clustering
3i"a!%anta&e" of 6ua!ati\$ (obin&
%t gives rise to secondary clustering

problem
if / is not prime, quadratic probing may not findan empty slot even if one exists
even if / is prime, may not find an empty slot if the bucket array is half#full
3OUBLE HASHING
0ouble hashing, collision function F)i* / i - #
4
)+*-
we apply a second hash function to x and probe at a distance h
;
)x+, ;h
;
)x+, , and so
onV
An example of a secondary function such as h
;
)x+ , W # )x mod W+, where W is a prime
smaller than TAJIE3"%4E
inserting keys 5DG, \$D, FG, =D, EG@ will result as
A!%anta&e" of !ouble #a"#in&
drastically reduces clustering and requires fewer comparisons than linear probing
3i"a!%anta&e" of !ouble #a"#in&
similar as in the case of linear7quadratic probing,the performance degrades as the table
fills up
REHASHING
%f the table gets too full, the running time for the operations will start taking too long and
inserts might fail for closed hashing with quadratic resolution
This can happen if there are too many deletions intermixed with insertions
A solution, then, is to build another table that is about twice as big )with associated new
hash function+ and scan down the entire original hash table, computing the new hash
value for each )non#deleted+ element and inserting it in the new table
Re#a"#in& o\$\$u"7
once the table becomes half full
once an insertion fails
once a specific load factor has been reached, where load factor is the ratio of the number
of elements in the hash table to the table size
E1A'RIE,
"uppose the elements \$?, \$=, ;F, and E are inserted into a closed hash table of size Q The hash
function is h)x+ , x mod Q "uppose linear probing is used to resolve collisions The resulting
hash table appears in
After inserting ;?, the resulting table
will be over Q! percent full Jecause the table is so full, a new table is created
The size of this table is \$Q, because this is the first prime which is twice as large as the
old table size
The new hash function is then h)x+ , x mod \$Q The old table is scanned, and elements E,
\$=, ;?, ;F, and \$? are inserted into the new table The resulting table appears in
HA"H3TAJIE rehash) HA"H3TAJIE H +
5
unsigned int i, old3size.
cell 2old3cells.
old3cells , H#Nthe3cells.
old3size , H#Ntable3size.
72 (et a new, empty table 27
H , initialize3table) ;2old3size +.
72 "can through old table, reinserting into new 27
for) i,!. i<old3size. i>> +
if) old3cellsOiPinfo ,, legitimate +
insert) old3cellsOiPelement, H +.
free) old3cells +.
return H.
@
A!%anta&e" of e#a"#in&
"imple to implement
Rrogrammer doesnHt worry about table size
Ban be used in other data structures as well
E.TENSIBLE HASHING
Ahen the table gets too full, an extremely expensive rehashing step must be performed,
which requires 6)n+ disk accesses
%n Extensible hashing the root of the MtreeM contains four pointers determined by the
leading two bits of the data Each leaf has up to m , F elements
%t happens that in each leaf the first two bits are identical. this is indicated by the number
in parentheses To be more formal, 0 will represent the number of bits used by the root,
which is sometimes known as the directory
The number of entries in the directory is thus ;
0

d
l
is the number of leading bits that all the elements of some leaf l have in common d
l

will depend on the particular leaf, and d
l
X0
"uppose that we want to insert the key \$!!\$!! This would go into the third leaf, but as the
third leaf is already full, there is no room Ae thus split this leaf into two leaves, which are
now determined by the first three bits
%f the key !!!!!! is now inserted, then the first leaf is split, generating two leaves with d
l
, ? "ince 0 , ?, the only change required in the directory is the updating of the !!! and !!\$
pointers
A!%anta&e" of e+ten"ible #a"#in&
'ore number of elements can be inserted as it uses array of linked list
3i"a!%anta&e" of e+ten"ible #a"#in&
This algorithm doesnot work if there are more than ' duplicates
%f the elements in a leaf agree in more than 0>\$ leading bits,then several directory splits
is possibleThe expected size of the directory is 6)/
i>l7'
7'+
2o""ible 4 8a9"
\$ Ahat is Hashing
; Arite a routine to find to perform Hash function
? Ahat is collisionY Ahat are the different collision resolving techniquesY
= Ahat is separate chainingY
E Ahat is Rrimary clusteringY
Q Ahat is WehashingY
D Ahat is extensible hashingY
G Bomparison of open hashing and closed hashing
Co'(ai"on of o(en #a"#in& an! \$lo"e! #a"#in&
O2EN HASHING CLOSE3 HASHING
\$ %t is also called as "eparate chaining
; %mplemented using Iinked list with pointers
? Hash table size is comparatively small
F %nsert and find operations are require more
time
\$ %t is also called as 6pen Addressing
; 0o not require pointers
? Hash table size is Iarge because all data
goes inside the table
F Uaster than open hashing
Loa! fa\$to:
Ioad factor of a hash table to be the ratio of the number of elements in the hash table to the table
size)ie+, The load factor Z of a hash table with n elements is given by the following formulaC
:/ n ; Table "ize