Hashing

UNIT I HASHING
Syllabus:
1. Hashing
2. General Idea
3. Hash Function
4. Collision resolution
5. Separate Chaining
6. Open Addressing
7. Linear Probing
8. Double hashing
9. Bucket hashing
10. Priority Queues (Heaps)
11. Binary Heap
Why we go for hashing?
We have all used a dictionary, and many of us have a word processor equipped with a
limited dictionary, that is a spelling checker. We consider the dictionary, as an ADT. Examples
of dictionaries are found in many applications, including the spelling checker, the thesaurus, the
data dictionary found in database management applications, and the symbol tables generated by
loaders, assemblers, and compilers.
In computer science, we generally use the term symbol table rather than dictionary, when
referring to the ADT. Viewed from this perspective, we define the symbol table as a set of
name-attribute pairs. The characteristics of the name and attribute vary according to the
application. For example, in a thesaurus, the name is a word, and the attribute is a list of
synonyms for the word; in a symbol table for a compiler, the name is an identifier, and the
attributes might include an initial value and a list of lines that use the identifier.
Generally we would want to perform the following operations on any symbol table:
(1) Determine if a particular name is in the table
(2) Retrieve the attributes of that name
(3) Modify the attributes of that name
(4) Insert a new name and its attributes
(5) Delete a name and its attributes
There are only three basic operations on symbol tables:
1. Searching,
2. Inserting,
3. Deleting.
The technique for those basic operations is hashing. Unlike search tree methods that rely on
identifier comparisons to perform a search, hashing relies on a formula called the hash function.
Definition:
Hashing is to provide a function ‘h’ called a hash function (or) randomizing function,
that is applied to the hash field value of a record and yields the address of the disk block in which
the record is stored. Tables which can be searched for an item in O(1) time using a hash function
to form an address from the key.
Features of Hashing:
As hashing is the approach for storing and searching the data so the major working is
done with the data .So main description of hashing are:
 Randomizing:
The spreading the data or records randomly over whole storage space.
 Collision:
When two different key hashes to the same address space. This is the one
major problem in hashing which will be discuses later chapter.
Limitations:
 Hashing provides very fast access to records on certain search conditions. This
organization is usually called a hash file.
 The search condition must be an equality condition on a single field, called the hash field
of the file. The hash field is also called as hash key.
 The idea behind hashing is also used as an internal search within a program whenever a
group of records is accessed or exclusively by using the value of one field.
Examples:
Given the values {2341, 4234, 2839, 430, 22, 397, 3920}, a hash table of size 7, and hash
function h(x) = x mod 7, show the resulting tables after inserting the values in the given order
with each of these collision strategies.
Hashing Functions:
Several kinds of uniform hashing function are in use.
Direct hashing:
The key is the address without any algorithmic manipulation. The data structure must
therefore contain an element for every possible key. While the situations where direct hasing are
limited, when it can be used it is very powerful becasue it guarantees that there are no collisions.
Limitations: Large key value.
Mid-Square (middle of Square):
9452 * 9452 = 89340304 = 3403
As a variation on the mid square method, we can select a portion of the key, such as the
middle three digits, and then use them rather than the whole key. This allows the method to be
used when the key is too large to square.
379452: 379 * 379 = 143641 = 364
121267: 121 * 121 = 014641 = 464
Modulo-Division:
Also known as Division-remainder.
Address = Key MOD Table size
While this algorithm works with any table size, a list size that is a prime number
produces fewer collisions than other list sizes.
Folding:
There are two folding methods that are used, fold shift and fold boundary. In fold shift,
the key value is divided into parts whose size matches the size of the required address. Then the
left and right parts are shifted and added with the middle part.
In fold boundary, the left and right numbers are folded on a fixed boundary between
them and the center number.
a. Fold Shift
Key: 123456789
123
456
789
---
1368 (1 is discarded)
b. Fold Boundary
Key: 123456789
321 (digit reversed)
456
987 (digit reversed)
---
1764 ( 1 is discarded)
Digit-Extraction:
Using digit extraction, selected digits are extracted from the key and used as the
address. For example, using a six-digit employee number to hash to a three-digit address (000-
999), we could select the first, third. and fourth digits (from left) and use them as the address.
379452 =394
121267 =112
Non-Numeric Keys:
If the identifiers were restricted to be at most six characters long with the first one
being a letter and the remaining either letters or decimal digits, then there would be
T = SUM(26 * 36^i) > 1.6 * 10^9.
0<=i<=5
Static Hashing
 A bucket is a unit of storage containing one or more records (a bucket is typically a disk
block).
 The file blocks are divided into M equal-sized buckets, numbered bucket0, bucket1...
bucketM-1.Typically, a bucket corresponds to one (or a fixed number of) disk block.
 In a hash file organization we obtain the bucket of a record directly from its search-key
value using a hash function, h (K).
 The record with hash key value K is stored in bucket, where i=h(K).
 Hash function is used to locate records for access, insertion as well as deletion.
 Records with different search-key values may be mapped to the same bucket; thus entire
bucket has to be searched sequentially to locate a record.
 primary pages fixed, allocated sequentially, never de-allocated; overflow pages if needed.
h(K) mod M = bucket to which data entry with key k belongs. (M = # of buckets)
Static External Hashing

 One of the file fields is designated to be the hash key, K, of the file.
 Collisions occur when a new record hashes to a bucket that is already full.
 An overflow file is kept for storing such records. Overflow records that hash to each
bucket can be linked together.
 To reduce overflow records, a hash file is typically kept 70-80% full.
 The hash function h should distribute the records uniformly among the buckets;
otherwise, search time will be increased because many overflow records will exist.
Static Hashing (Contd.)
 Hash function works on search key field of record r. Must distribute values over range 0
... M-1.
H (K) = (a * K + b) usually works well.
a and b are constants;
lots known abut how to tune h.
 Typical hash functions perform computation on the internal binary representation of the
search-key.
For example, for a string search-key, the binary representations of all the characters in the
string could be added and the sum modulo the number of buckets could be returned. .
 Ideal hash function is random, so each bucket will have
the same number of records assigned to it irrespective of
the actual distribution of search-key values in the file.
Dynamic and Extendible Hashing Techniques
 Hashing techniques are adapted to allow the dynamic growth and shrinking of the
number of file records.
These techniques include the following:
o Dynamic hashing
o Extendible hashing
o Linear hashing.
These hashing techniques use the binary representation of the hash value h(K).
 In dynamic hashing the directory is a binary tree.
 In extendible hashing the directory is an array of size 2d where d is called the global
depth.
 The directories can be stored on disk, and they expand or shrink dynamically. Directory
entries point to the disk blocks that contain the stored records.
 An insertion in a disk block that is full causes the block to split into two blocks and the
records are redistributed among the two blocks.
 The directory is updated appropriately.
 Dynamic and extendible hashing do not require an overflow area.
 Linear hashing does require an overflow area but does not use a directory. Blocks are
split in linear order as the file expands.
Dynamic Hashing
 Good for database that grows and shrinks in size
 Allows the hash function to be modified dynamically
Extendable hashing – one form of dynamic hashing
 Hash function generates values over a large range —typically b-bit integers, with b = 32.
 At any time use only a prefix of the hash function to index into a table of bucket
addresses.
 Let the length of the prefix be i bits, 0 _ i _ 32.
 Bucket address table size = 2i. Initially i = 0
 Value of i grows and shrinks as the size of the database grows and shrinks.
 Multiple entries in the bucket address table may point to a bucket.
 Thus, actual number of buckets is < 2i
 The number of buckets also changes dynamically due to coalescing and splitting of
buckets.
General Extendable Hash Structure
Linear Hashing
 This is another dynamic hashing scheme, an alternative to Extendible Hashing.
 LH handles the problem of long overflow chains without using a directory, and handles
duplicates.
 Idea: Use a family of hash functions h0, h1, h2,...
hi(key) = h(key) mod(2iN); N = initial # buckets
h is some hash function (range is not 0 to N-1)
If N = 2d0, for some d0, hi consists of applying h and looking at the last di bits, where di = d0 +
i.
hi+1 doubles the range of hi (similar to directory doubling)
Source code:
#include<stdio.h>
int h,r,m,n,l,k,i,j,p,a[10];
int b[10];
int main()
{
printf("\nEnter the array size:");
scanf("%d",&n);
printf("\nEnter the table size:");
scanf("%d",&m);
for(i=0;i<n;i++)
scanf("%d",&a[i]);
for(i=0;i<m;i++)
b[i]=0;
for(j=0;j<n;j++)
{
for(i=0;i<m;i++)
{
l=a[j]%m;
k=(l+i)%m;
if(b[k]==0)
{
b[k]=a[j];
break;
}
}
}
for(i=0;i<m;i++)
printf("\nb[%d]=%d",i,b[i]);
}
OUTPUT:
Collision resolution:
Problem: Obviously, a mapping from a potentially huge set of strings to a small set of integers
will not be unique. The hash function maps keys into indices in many-to-one fashion. Having a
second key into a previously used slot is called a collision.
Collision resolution: deals with keys that are mapped to same addresses.
 Two keys mapping to the same location in the hash table is called “Collision”.
 Collisions can be reduced with a selection of a good hash function.
 But it is not possible to avoid collisions altogether
unless we can find a perfect hash function
which is hard to do.
Methods:
a. Separate chaining
b. Open addressing
i. Linear probing
ii. Quadratic probing
iii. Double hashing
Separate chaining:
 Hash table will have 'n' number of buckets.
 To insert a node into the hash table, we need to find the hash index for the given key.
And it could be calculated using the hash function.
Example: hashIndex = key % noOfBuckets
 Move to the bucket corresponds to the above calculated hash index and insert the new
node at the end of the list.
 To delete a node from hash table, get the key from the user, calculate the hash index,
move to the bucket corresponds to the calculated hash index, search the list in the current
bucket to find and remove the node with the given key. Finally, remove the node with
given key, if it is present.
Hash table with 5 buckets. 0, 1, 2, 3 and 4 are the hash indexes
+-------+
| 0 |
+-------+
| 1 |
+-------+
| 2 |
+-------+
| 3 |
+-------+
| 4 |
+-------+
Insert a node with key 33 into the hash table.

hashIndex = 33 % 5(no of buckets)
hashIndex = 3
Hash index is 3. So, insert the new node to the bucket with hash index 3.
+-------+
| 0 |
+-------+ ----------------- ------------------
| 1 |---->| 21 | data| -|---->| 31 | data | -|----->X
+-------+ ----------------- -----------------
| 2 |
+-------+ ------------------
| 3 |--->| 33 | data| |--->X
+-------+ ------------------
| 4 |
+-------+
Delete a node with key 31 from the hash table.

hashIndex = 31 % 5(no of buckets) = 1
Move to the bucket with above calculated hash index(1), search the list in the current
bucket(bucket with index 1)to find the node with given key and delete it.
+-------+
| 0 |
+-------+ -----------------
| 1 |---->| 21 | data| -|---->X
+-------+ -----------------
| 2 |
+-------+ ------------------
| 3 |--->| 33 | data| |--->X
+-------+ ------------------
| 4 |
+-------+
Let's take a simple example. First, we start with a hash table array of strings (we'll use strings
as the data being stored and searched in this example). Let's say the hash table size is 12:
Figure %: The empty hash table of strings
Next we need a hash function. There are many possible ways to construct a hash function. We'll
discuss these possibilities more in the next section. For now, let's assume a simple hash function
that takes a string as input. The returned hash value will be the sum of the ASCII characters that
make up the string mod the size of the table:
int hash(char *str, int table_size)

{
int sum;
/* Make sure a valid string passed in */

if (str==NULL) return -1;
/* Sum up all the characters in the string */

for( ; *str; str++) sum += *str;
/* Return the sum mod the table size */

return sum % table_size;
}
Now that we have a framework in place, let's try using it. First, let's store a string into the table:
"Steve". We run "Steve" through the hash function, and find that hash("Steve",12) yields 3:
Figure %: The hash table after inserting "Steve"
Let's try another string: "Spark". We run the string through the hash function and find that
hash("Spark",12) yields 6. Fine. We insert it into the hash table:
Figure %: The hash table after inserting "Spark"
Let's try another: "Notes". We run "Notes" through the hash function and find that
hash("Notes",12) is 3. Ok. We insert it into the hash table:
Figure %: A hash table collision

What happened? A hash function doesn't guarantee that every input will map to a different
output (in fact, as we'll see in the next section, it shouldn't do this). There is always the chance
that two inputs will hash to the same output. This indicates that both elements should be inserted
at the same place in the array, and this is impossible. This phenomenon is known as a collision.
There are many algorithms for dealing with collisions, such as linear probing an d separate
chaining. While each of the methods has its advantages, we will only discuss separate chaining
here.
Separate chaining requires a slight modification to the data structure. Instead of storing the data
elements right into the array, they are stored in linked lists. Each slot in the array then points to
one of these linked lists. When an element hashes to a value, it is added to the linked list at that
index in the array. Because a linked list has no limit on length, collisions are no longer a
problem. If more than one element hashes to the same value, then both are stored in that linked
list.
Let's look at the above example again, this time with our modified data structure:
Figure %: Modified table for separate chaining
Again, let's try adding "Steve" which hashes to 3:
Figure %: After adding "Steve" to the table

And "Spark" which hashes to 6:
Figure %: After adding "Spark" to the table
Now we add "Notes" which hashes to 3, just like "Steve":
Figure %: Collision solved - "Notes" added to table

Once we have our hash table populated, a search follows the same steps as doing an insertion.
We hash the data we're searching for, go to that place in the array, look down the list originating
from that location, and see if what we're looking for is in the list. The number of steps is O(1) .
Separate chaining allows us to solve the problem of collision in a simple yet powerful manner.
Of course, there are some drawbacks. Imagine the worst case scenario where through some fluke
of bad luck and bad programming, every data element hashed to the same value. In that case, to
do a lookup, we'd really be doing a straight linear search on a linked list, which means that our
search operation is back to being O(n) . The worst case search time for a hash table is O(n) .
However, the probability of that happening is so small that, while the worst case search time is
O(n) , both the best and average cases are O(1) .
Example Program To Implement Chain Hashing (in C):

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
struct hash *hashTable = NULL;
int eleCount = 0;
struct node {
int key, age;
char name[100];
struct node *next;
};
struct hash {
struct node *head;
int count;
};
struct node * createNode(int key, char *name, int age) {
struct node *newnode;
newnode = (struct node *)malloc(sizeof(struct node));
newnode->key = key;
newnode->age = age;
strcpy(newnode->name, name);
newnode->next = NULL;
return newnode;
}
void insertToHash(int key, char *name, int age) {
int hashIndex = key % eleCount;
struct node *newnode = createNode(key, name, age);
/* head of list for the bucket with index "hashIndex" */
if (!hashTable[hashIndex].head) {
hashTable[hashIndex].head = newnode;
hashTable[hashIndex].count = 1;
return;
}
/* adding new node to the list */
newnode->next = (hashTable[hashIndex].head);
/*
* update the head of the list and no of
* nodes in the current bucket
*/
hashTable[hashIndex].head = newnode;
hashTable[hashIndex].count++;
return;
}
void deleteFromHash(int key) {
/* find the bucket using hash index */
int hashIndex = key % eleCount, flag = 0;
struct node *temp, *myNode;
/* get the list head from current bucket */
myNode = hashTable[hashIndex].head;
if (!myNode) {
printf("Given data is not present in hash Table!!\n");
return;
}
temp = myNode;
while (myNode != NULL) {
/* delete the node with given key */
if (myNode->key == key) {
flag = 1;
if (myNode == hashTable[hashIndex].head)
hashTable[hashIndex].head = myNode->next;
else
temp->next = myNode->next;
hashTable[hashIndex].count--;
free(myNode);
break;
}
temp = myNode;
myNode = myNode->next;
}
if (flag)
printf("Data deleted successfully from Hash Table\n");
else
printf("Given data is not present in hash Table!!!!\n");
return;
}
void searchInHash(int key) {

int hashIndex = key % eleCount, flag = 0;
struct node *myNode;
myNode = hashTable[hashIndex].head;
if (!myNode) {
printf("Search element unavailable in hash table\n");
return;
}
if (myNode->key == key) {
printf("VoterID : %d\n", myNode->key);
printf("Name : %s\n", myNode->name);
printf("Age : %d\n", myNode->age);
flag = 1;
break;
}
}
if (!flag)
printf("Search element unavailable in hash table\n");
return;
}
void display() {
struct node *myNode;
int i;
for (i = 0; i < eleCount; i++) {
if (hashTable[i].count == 0)
continue;
myNode = hashTable[i].head;
if (!myNode)
continue;
printf("\nData at index %d in Hash Table:\n", i);
printf("VoterID Name Age \n");
printf("--------------------------------\n");
printf("%-12d", myNode->key);
printf("%-15s", myNode->name);
printf("%d\n", myNode->age);
}
}
return;
}
int main() {
int n, ch, key, age;
char name[100];
printf("Enter the number of elements:");
scanf("%d", &n);
eleCount = n;
/* create hash table with "n" no of buckets */
hashTable = (struct hash *)calloc(n, sizeof (struct hash));
while (1) {
printf("\n1. Insertion\t2. Deletion\n");
printf("3. Searching\t4. Display\n5. Exit\n");
printf("Enter your choice:");
scanf("%d", &ch);
switch (ch) {
case 1:
printf("Enter the key value:");
scanf("%d", &key);
getchar();
printf("Name:");
fgets(name, 100, stdin);
name[strlen(name) - 1] = '\0';
printf("Age:");
scanf("%d", &age);
/*inserting new node to hash table */
insertToHash(key, name, age);
break;
case 2:
printf("Enter the key to perform deletion:");
scanf("%d", &key);
/* delete node with "key" from hash table */
deleteFromHash(key);
break;
case 3:
printf("Enter the key to search:");
scanf("%d", &key);
searchInHash(key);
break;
case 4:
display();
break;
case 5:
exit(0);
default:
printf("U have entered wrong option!!\n");
break;
}
}
return 0;
}
OUTPUT:
Open Addressing:
Invented by A. P. Ershov and W. W. Peterson in 1957 independently.
Idea: Store collisions in the hash table itself.
The method uses a collision resolution function in addition to the hash functon.
If collision occurs, next probes are performed following the formula:
hi(x) = (hash(x) + f(i)) mod TableSize
where:
hash(x) is the hash function
f(i) is the collision resolution function
i is the number of the current attempt (probe) to insert an element.
a. Linear probing (linear hashing, sequential probing): f(i) = i

Insert: When there is a collision we just probe the next slot in the table.
If it is unoccupied – we store the key there.
If it is occupied – we continue probing the next slot.
Search: If the key hashes to a position that is occupied and there is no match,
we probe the next position.
a) match – successful search
b) empty position – unsuccessful search
c) occupied and no match – continue probing.
When the end of the table is reached, the probing continues from the beginning,
until the original starting position is reached.
Problems with delete: a special flag is needed to distinguish deleted from empty
positions.
This is necessary for the search function – if we come to a "deleted" position,
the search has to continue as the deletion might have been done after
the insertion of the key we are looking for, and it might be further in the table.
Total amount of memory space – less, since no pointers are maintained.
Disadvantage: " Primary clustering"
Large clusters tend to build up: if an empty slot is preceded by i filled slots, the
probability that the empty slot is the next one to be filled is (i+1)/M.
If the preceding slot was empty, the probability is 1/M.
This means that when the table begins to fill up, many other slots are examined.
Linear probing runs slowly for nearly full tables.
b. Quadratic probing: f(i) = i2
A guadratic function is used to compute the next index in the table to be probed.
Example:
In linear probing we check the i-th position. If it is occupied, we check the i+1st position,
next we check the i+2nd, etc.
In quadric probing, if the i-th position is occupied we check the i+1st,
next we check the i+4th, next - i + 9th etc.
The idea here is to skip regions in the table with possible clusters.
c. Double hashing: f(i) = i * hash2(x)
Purpose – same as in quadratic probing : to overcome the disadvantage of clustering.
Instead of examining each successive entry following a collided position, we use
a second hash function to get a fixed increment for the "probe" sequence.
The second function should be chosen so that the increment and M are relatively prime.
Otherwise there will be slots that would remain unexamined.
Example: hash2(x) = R - (x mod R), R is smaller than TableSize, prime.
In open addressing the load factor L is less than 1.
Good strategy is to keep L < 0.5
If the table is close to full, the search time grows and may become equal to the table size
Rehashing
If the table is close to full, the search time grows and may become equal to the table size.
When the load factor exceeds a certain value (e.g. greater than 0.5) we do rehashing :
Build a second table twice as large as the original
and rehash there all the keys of the original table.
Rehashing is expensive operation, with running time O(N)
However, once done, the new hash table will have good performance.
Extendible hashing
Used when the amount of data is too large to fit in main memory and external storage is used.
N records in total to store, M records in one disk block
The problem: in ordinary hashing several disk blocks may be examined to find an element -
a time consuming process.
Extendible hashing: no more than two blocks are examined.
Idea:
Keys are grouped according to the first m bits in their code.
Each group is stored in one disk block.
If the block becomes full and no more records can be inserted, each group is split into
two,
and m+1 bits are considered to determine the location of a record.
Example: lets' say we have 4 groups of keys according to the first two bits:
00 01 10 11
00010 01001 10001 11000

00100 01010 10100 11010
01100
Each disk block in the example can contain 3 records only, 4 blocks are needed to store the
above keys
New key to be inserted: 01011.
Block2 is full, so we start considering 3 bits:
000/001 010 011 100/101 110/111

(still on same block)
00010 01001 01100 10001 11000

00100 01010 10100 11010
01011
The second group of keys is split onto two disk blocks - one for keys staring with 010,
and one for keys starting with 011.
A directory is maintained in main memory with pointers to the disk blocks for each bit pattern.
The size of the directory is 2D = O(N(1+1/M)/M), where
D - number of bits considered
N - number of records
M - number of disk blocks.
Conclusion
Hashing is the best search method (constant running time) if we don't need to have the records
sorted.
The choice of the hash function remains the most difficult part of the task and depends very
much on the nature of the keys.
Separate chaining or open addressing?
Open addressing is the preferred method if there is enough memory
to keep a table twice larger than the number of the records.
Separate chaining is used when we don't know in advance the number of the records to
be stored. Though it requires additional time for list processing, it is simpler to
implement.
Some application areas
Dictionaries, on-line spell checkers, compiler symbol tables.
Closed Hashing - Linear Probing
 Linear Probing resolves hash collision(same hash value for two or more data).
 It allows user to get the free space by searching the hash table sequentially.
To insert an element into the hash table, we need to find the hash index from the given key.
Example: hashIndex = key % tableSize (hash table size)
If the resultant hash index is already occupied by another data, we need to do linear probing to
find a free space in hash table.
Example: hashIndex = (key + i) % tableSize where i = 0,1,2...
To delete an element from the hash table, we need to calculate the hash index from the given
key.
hashIndex = key % tableSize
If the given key is not available at the resultant hash index, we need to probe forward until we
encounter the given key or the value '0' for marker.
If the marker of any bucket is 0, then the given data is not present in the hash table. If we
encounter the given key in the hash table, then delete it and set the marker value to -1.
What is the purpose of marker in linear probing?

Just for example, try to insert the keys 21, 32 and 31 into the hash table.
hashIndex = key % tableSize(5)

hashIndex for 21 = 21 % 5 = 1
We are getting collision for data 31 because index 1 is already occupied by 21. So, we need to
check the next available location by doing linear probing.
Check for next available location(for key 31)

hashIndex = (key + i) % 5 where i=0,1,2,...
hashIndex for 31 = (31 + 1) % 5 = 2
Hash index 2 is also occupied already. So, we need to check for the next available location.
hashIndex for 31 = (31+2) % 5 = 3.
Hash index 3 is not occupied. So, we can insert key 31 in the third bucket.
Suppose if user deletes 32 and then he wants to delete 31. We will get hash index as 1 for the
key 31. But, the bucket at hash index 1 holds the data 21. So, we will check the next bucket((31
+1) % 5 = 2nd bucket) and it would be empty. So, we might think that either the key 31 is not
present or it's already gone. In order to avoid that, we will set the marker to -1 which indicates
that the data we search might be present in the subsequent buckets.
Example program for Linear Probing (in C):

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int tableSize = 0, totEle = 0;

struct node *hashTable = NULL;
struct node {
int age, key;
char name[100];
int marker;
};
void insertInHash(int key, char *name, int age) {

int hashIndex = key % tableSize;
if (tableSize == totEle) {
printf("Can't perform Insertion..Hash Table is full!!");
return;
}
while (hashTable[hashIndex].marker == 1) {
hashIndex = (hashIndex + 1)%tableSize;
}
hashTable[hashIndex].key = key;
hashTable[hashIndex].age = age;
strcpy(hashTable[hashIndex].name, name);
hashTable[hashIndex].marker = 1;
totEle++;
return;
}
void deleteFromHash(int key) {

int hashIndex = key % tableSize, count = 0, flag = 0;
if (totEle == 0) {
printf("Hash Table is Empty!!\n");
return;
}
while (hashTable[hashIndex].marker != 0 && count <= tableSize) {

if (hashTable[hashIndex].key == key) {
hashTable[hashIndex].key = 0;
/* set marker to -1 during deletion operation*/
hashTable[hashIndex].marker = -1;
hashTable[hashIndex].age = 0;
strcpy(hashTable[hashIndex].name, "\0");
totEle--;
flag = 1;
break;
}
count++;
}
if (flag)
printf("Given data deleted from Hash Table\n");
else
printf("Given data is not available in Hash Table\n");
return;
}
void searchElement(int key) {

int hashIndex = key % tableSize, flag = 0, count = 0;
if (totEle == 0) {
printf("Hash Table is Empty!!");
return;
}
while (hashTable[hashIndex].marker != 0 && count <= tableSize) {
if (hashTable[hashIndex].key == key) {
printf("Voter ID : %d\n", hashTable[hashIndex].key);
printf("Name : %s\n", hashTable[hashIndex].name);
printf("Age : %d\n", hashTable[hashIndex].age);
flag = 1;
break;
}
}
if (!flag)
printf("Given data is not present in hash table\n");
return;
}
void display() {
int i;
if (totEle == 0) {
return;
}
printf("Voter ID Name Age Index \n");
printf("-----------------------------------------\n");
for (i = 0; i < tableSize; i++) {
if (hashTable[i].marker == 1) {
printf("%-13d", hashTable[i].key);
printf("%-15s", hashTable[i].name);
printf("%-7d", hashTable[i].age);
printf("%d\n", i);
}
}
printf("\n");
return;
}
int main() {
int key, age, ch;
char name[100];
printf("Enter the no of elements:");
scanf("%d", &tableSize);
hashTable = (struct node *)calloc(tableSize, sizeof(struct node));
while (1) {
printf("1. Insertion\t2. Deletion\n");
printf("3. Searching\t4. Display\n");
printf("5. Exit\nEnter ur choice:");
scanf("%d", &ch);
switch (ch) {
case 1:
scanf("%d", &key);
getchar();
printf("Name:");
printf("Age:");
scanf("%d", &age);
insertInHash(key, name, age);
break;
case 2:
scanf("%d", &key);
deleteFromHash(key);
break;
case 3:
scanf("%d", &key);
searchElement(key);
break;
case 4:
display();
break;
case 5:
exit(0);
default:
printf("U have entered wrong Option!!\n");
break;
}
}
return 0;
}
Quadratic Probing:
Quadratic probing insertion
The problem, here, is to insert a key at an available key space in a given Hash Table using
quadratic probing.
Algorithm to insert key in hash table
1. Get the key k
2. Set counter j = 0
3. Compute hash function h[k] = k % SIZE
4. If hashtable[h[k]] is empty
(4.1) Insert key k at hashtable[h[k]]
(4.2) Stop
Else
(4.3) The key space at hashtable[h[k]] is occupied, so we need to find the next available key
space
(4.4) Increment j
(4.5) Compute new hash function h[k] = ( k + j * j ) % SIZE
(4.6) Repeat Step 4 till j is equal to the SIZE of hash table
5. The hash table is full
6. Stop
C function for key insertion

int quadratic_probing_insert(int *hashtable, int key, int *empty){
/* hashtable[] is an integer hash table; empty[] is another array which indicates whether the
key space is occupied;
If an empty key space is found, the function returns the index of the bucket where the key is
inserted, otherwise it
returns (-1) if no empty key space is found */
int j = 0, hk;
hk = key % SIZE;
while(j < SIZE) {
if(empty[hk] == 1){
hashtable[hk] = key;
empty[hk] = 0;
return (hk);
}
j++;
hk = (key + j * j) % SIZE;
}
return (-1);
}
Quadratic probing search
Algorithm to search element in hash table
1. Get the key k to be searched
2. Set counter j = 0
3. Compute hash function h[k] = k % SIZE
4. If the key space at hashtable[h[k]] is occupied
(4.1) Compare the element at hashtable[h[k]] with the key k.
(4.2) If they are equal
(4.2.1) The key is found at the bucket h[k]
(4.2.2) Stop
Else
(4.3) The element might be placed at the next location given by the quadratic function
(4.4) Increment j
(4.5) Compute new hash function h[k] = ( k + j * j ) % SIZE
(4.6) Repeat Step 4 till j is greater than SIZE of hash table
5. The key was not found in the hash table
6. Stop
C function for key searching

int quadratic_probing_search(int *hashtable, int key, int *empty)
{
/* If the key is found in the hash table, the function returns the index of the hashtable where
the key is inserted, otherwise it
returns (-1) if the key is not found */
int j = 0, hk;
hk = key % SIZE;
while(j < SIZE)
{
if((empty[hk] == 0) && (hashtable[hk] == key))
return (hk);
j++;
hk = (key + j * j) % SIZE;
}
return (-1);
}
Closed Hashing - Double Hashing
 Double hashing is popular hashing technique where the interval between probes is
calculated by another hash function.
 It avoids hash collision (two or more data with same hash value).
Example Program To Implement Double Hashing (in C):

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int tableSz = 0, totElements = 0, subHash = 0;
struct node *hashBucket = NULL;
struct node {
int age, key;
char name[100];
int marker;
};
void insertIntoHashTable(int key, char *name, int age) {
/* 1st hashing - finding hash index */
int hashInd = key % tableSz, hashVal;
/* 2nd hashing - no of buckets to skip while probing */
hashVal = subHash - (key % subHash);
if (tableSz == totElements) {
printf("Can't perform Insertion..Hash Table is full!!");
return;
}
/* linear probing */
while (hashBucket[hashInd].marker == 1) {
hashInd = (hashInd + hashVal)%tableSz;
}
hashBucket[hashInd].key = key;
hashBucket[hashInd].age = age;
strcpy(hashBucket[hashInd].name, name);
hashBucket[hashInd].marker = 1;
totElements++;
return;
}
void deleteFromHashTable(int key) {
int hashInd = key % tableSz, count = 0, flag = 0, hashVal;
if (totElements == 0) {
return;
while (hashBucket[hashInd].marker != 0 && count <= tableSz) {
if (hashBucket[hashInd].key == key) {
hashBucket[hashInd].key = 0;
hashBucket[hashInd].marker = -1;
hashBucket[hashInd].age = 0;
strcpy(hashBucket[hashInd].name, "\0");
totElements--;
flag = 1;
break;
hashInd = (hashInd + hashVal)%tableSz;
count++;
if (flag)
printf("Given data deleted from Hash Table\n");
else
printf("Given data is not available in Hash Table\n");
return;
void searchData(int key) {
int hashInd = key % tableSz, flag = 0, count = 0, hashVal = 0;
printf("Hash Table is Empty!!");
return;
while (hashBucket[hashInd].marker != 0 && count <= tableSz) {
if (hashBucket[hashInd].key == key) {
printf("Voter ID : %d\n", hashBucket[hashInd].key);
printf("Name : %s\n", hashBucket[hashInd].name);
printf("Age : %d\n", hashBucket[hashInd].age);
flag = 1;
break;
}
hashInd = (hashInd + hashVal) % tableSz;
if (!flag)
printf("Given data is not present in hash table\n");
return;
void display() {
int i;
return;
printf("Voter ID Name Age Index \n");
printf("-----------------------------------------\n");
for (i = 0; i < tableSz; i++) {
if (hashBucket[i].marker == 1) {
printf("%-13d", hashBucket[i].key);
printf("%-15s", hashBucket[i].name);
printf("%-7d", hashBucket[i].age);
printf("%d\n", i);
printf("\n");
return;
int main() {
int key, age, ch, i, flag = 0;
char name[100];
printf("Enter the no of elements:");
scanf("%d", &tableSz);
while (1) {
for (i = 2; i < tableSz; i++) {
if (tableSz % i == 0) {
flag = 1;
break;
if (!flag && tableSz > 2)
break;
flag = 0;
tableSz++;
/* calculating sub-hash value */
subHash = (tableSz % 2 == 0) ? tableSz / 2 : (tableSz + 1) / 2;
/* allocating memory for hash bucket */
hashBucket = (struct node *)calloc(tableSz, sizeof(struct node));
while (1) {
printf("1. Insertion\t2. Deletion\n");
printf("3. Searching\t4. Display\n");
printf("5. Exit\nEnter ur choice:");
scanf("%d", &ch);
switch (ch) {
case 1:
scanf("%d", &key);
getchar();
printf("Name:");
printf("Age:");
scanf("%d", &age);
insertIntoHashTable(key, name, age);
break;
case 2:
scanf("%d", &key);
deleteFromHashTable(key);
break;
case 3:
scanf("%d", &key);
searchData(key);
break;
case 4:
display();
break;
case 5:
exit(0);
default:
printf("U have entered wrong Option!!\n");

break;
return 0;
Open addressing vs. chaining
Chaining Open addressing
Collision resolution Using external data structure Using hash table itself
Pointer size overhead per entry

Memory waste No overhead 1
(storing list heads in the table)
Performance dependence on Proportional to (loadFactor) / (1 -

Directly proportional
table's load factor loadFactor)
No. Moreover, it's recommended

Allow to store more items, than
Yes to keep table's load factor below
hash table size
0.7
Uniform distribution, should

Hash function requirements Uniform distribution
avoid clustering
Removals clog the hash table

Handle removals Removals are ok
with "DELETED" entries
Correct implementation of open

Implementation Simple addressing based hash table is
quite tricky

Hashing

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Hashing

Загружено:

Авторское право:

Доступные форматы

UNIT I HASHING

Static External Hashing

General Extendable Hash Structure

Insert a node with key 33 into the hash table.

Delete a node with key 31 from the hash table.

int hash(char *str, int table_size)

/* Make sure a valid string passed in */

/* Sum up all the characters in the string */

/* Return the sum mod the table size */

Figure %: A hash table collision

Figure %: After adding "Steve" to the table

Figure %: Collision solved - "Notes" added to table

Example Program To Implement Chain Hashing (in C):

void searchInHash(int key) {

a. Linear probing (linear hashing, sequential probing): f(i) = i

00010 01001 10001 11000

000/001 010 011 100/101 110/111

00010 01001 01100 10001 11000

What is the purpose of marker in linear probing?

hashIndex = key % tableSize(5)

Check for next available location(for key 31)

Example program for Linear Probing (in C):

int tableSize = 0, totEle = 0;

void insertInHash(int key, char *name, int age) {

void deleteFromHash(int key) {

while (hashTable[hashIndex].marker != 0 && count <= tableSize) {

void searchElement(int key) {

C function for key insertion

C function for key searching

Closed Hashing - Double Hashing

Example Program To Implement Double Hashing (in C):

void deleteFromHashTable(int key) {

int hashInd = key % tableSz, count = 0, flag = 0, hashVal;

hashVal = subHash - (key % subHash);

printf("Hash Table is Empty!!\n");

while (hashBucket[hashInd].marker != 0 && count <= tableSz) {

hashInd = (hashInd + hashVal)%tableSz;

printf("Given data deleted from Hash Table\n");

printf("Given data is not available in Hash Table\n");

void searchData(int key) {

int hashInd = key % tableSz, flag = 0, count = 0, hashVal = 0;

hashVal = subHash - (key % subHash);

printf("Hash Table is Empty!!");

while (hashBucket[hashInd].marker != 0 && count <= tableSz) {

printf("Voter ID : %d\n", hashBucket[hashInd].key);

printf("Name : %s\n", hashBucket[hashInd].name);

printf("Age : %d\n", hashBucket[hashInd].age);

hashInd = (hashInd + hashVal) % tableSz;

printf("Given data is not present in hash table\n");

printf("Hash Table is Empty!!\n");

printf("Voter ID Name Age Index \n");

for (i = 0; i < tableSz; i++) {

printf("Enter the no of elements:");

for (i = 2; i < tableSz; i++) {

if (!flag && tableSz > 2)

/* calculating sub-hash value */

subHash = (tableSz % 2 == 0) ? tableSz / 2 : (tableSz + 1) / 2;

/* allocating memory for hash bucket */

hashBucket = (struct node *)calloc(tableSz, sizeof(struct node));

printf("1. Insertion\t2. Deletion\n");

printf("3. Searching\t4. Display\n");

printf("5. Exit\nEnter ur choice:");

fgets(name, 100, stdin);

insertIntoHashTable(key, name, age);