Академический Документы
Профессиональный Документы
Культура Документы
Syllabus:
1. Hashing
2. General Idea
3. Hash Function
4. Collision resolution
5. Separate Chaining
6. Open Addressing
7. Linear Probing
8. Double hashing
9. Bucket hashing
10. Priority Queues (Heaps)
11. Binary Heap
Why we go for hashing?
We have all used a dictionary, and many of us have a word processor equipped with a
limited dictionary, that is a spelling checker. We consider the dictionary, as an ADT. Examples
of dictionaries are found in many applications, including the spelling checker, the thesaurus, the
data dictionary found in database management applications, and the symbol tables generated by
loaders, assemblers, and compilers.
In computer science, we generally use the term symbol table rather than dictionary, when
referring to the ADT. Viewed from this perspective, we define the symbol table as a set of
name-attribute pairs. The characteristics of the name and attribute vary according to the
application. For example, in a thesaurus, the name is a word, and the attribute is a list of
synonyms for the word; in a symbol table for a compiler, the name is an identifier, and the
attributes might include an initial value and a list of lines that use the identifier.
Generally we would want to perform the following operations on any symbol table:
(1) Determine if a particular name is in the table
(2) Retrieve the attributes of that name
(3) Modify the attributes of that name
(4) Insert a new name and its attributes
(5) Delete a name and its attributes
There are only three basic operations on symbol tables:
1. Searching,
2. Inserting,
3. Deleting.
The technique for those basic operations is hashing. Unlike search tree methods that rely on
identifier comparisons to perform a search, hashing relies on a formula called the hash function.
Definition:
Hashing is to provide a function ‘h’ called a hash function (or) randomizing function,
that is applied to the hash field value of a record and yields the address of the disk block in which
the record is stored. Tables which can be searched for an item in O(1) time using a hash function
to form an address from the key.
Features of Hashing:
As hashing is the approach for storing and searching the data so the major working is
done with the data .So main description of hashing are:
Randomizing:
The spreading the data or records randomly over whole storage space.
Collision:
When two different key hashes to the same address space. This is the one
major problem in hashing which will be discuses later chapter.
Limitations:
Hashing provides very fast access to records on certain search conditions. This
organization is usually called a hash file.
The search condition must be an equality condition on a single field, called the hash field
of the file. The hash field is also called as hash key.
The idea behind hashing is also used as an internal search within a program whenever a
group of records is accessed or exclusively by using the value of one field.
Examples:
Given the values {2341, 4234, 2839, 430, 22, 397, 3920}, a hash table of size 7, and hash
function h(x) = x mod 7, show the resulting tables after inserting the values in the given order
with each of these collision strategies.
Hashing Functions:
Several kinds of uniform hashing function are in use.
Direct hashing:
The key is the address without any algorithmic manipulation. The data structure must
therefore contain an element for every possible key. While the situations where direct hasing are
limited, when it can be used it is very powerful becasue it guarantees that there are no collisions.
Limitations: Large key value.
Mid-Square (middle of Square):
9452 * 9452 = 89340304 = 3403
As a variation on the mid square method, we can select a portion of the key, such as the
middle three digits, and then use them rather than the whole key. This allows the method to be
used when the key is too large to square.
379452: 379 * 379 = 143641 = 364
121267: 121 * 121 = 014641 = 464
Modulo-Division:
Also known as Division-remainder.
Address = Key MOD Table size
While this algorithm works with any table size, a list size that is a prime number
produces fewer collisions than other list sizes.
Folding:
There are two folding methods that are used, fold shift and fold boundary. In fold shift,
the key value is divided into parts whose size matches the size of the required address. Then the
left and right parts are shifted and added with the middle part.
In fold boundary, the left and right numbers are folded on a fixed boundary between
them and the center number.
a. Fold Shift
Key: 123456789
123
456
789
---
1368 (1 is discarded)
b. Fold Boundary
Key: 123456789
321 (digit reversed)
456
987 (digit reversed)
---
1764 ( 1 is discarded)
Digit-Extraction:
Using digit extraction, selected digits are extracted from the key and used as the
address. For example, using a six-digit employee number to hash to a three-digit address (000-
999), we could select the first, third. and fourth digits (from left) and use them as the address.
379452 =394
121267 =112
Non-Numeric Keys:
If the identifiers were restricted to be at most six characters long with the first one
being a letter and the remaining either letters or decimal digits, then there would be
T = SUM(26 * 36^i) > 1.6 * 10^9.
0<=i<=5
Static Hashing
A bucket is a unit of storage containing one or more records (a bucket is typically a disk
block).
The file blocks are divided into M equal-sized buckets, numbered bucket0, bucket1...
bucketM-1.Typically, a bucket corresponds to one (or a fixed number of) disk block.
In a hash file organization we obtain the bucket of a record directly from its search-key
value using a hash function, h (K).
The record with hash key value K is stored in bucket, where i=h(K).
Hash function is used to locate records for access, insertion as well as deletion.
Records with different search-key values may be mapped to the same bucket; thus entire
bucket has to be searched sequentially to locate a record.
primary pages fixed, allocated sequentially, never de-allocated; overflow pages if needed.
h(K) mod M = bucket to which data entry with key k belongs. (M = # of buckets)
Linear Hashing
This is another dynamic hashing scheme, an alternative to Extendible Hashing.
LH handles the problem of long overflow chains without using a directory, and handles
duplicates.
Idea: Use a family of hash functions h0, h1, h2,...
hi(key) = h(key) mod(2iN); N = initial # buckets
h is some hash function (range is not 0 to N-1)
If N = 2d0, for some d0, hi consists of applying h and looking at the last di bits, where di = d0 +
i.
hi+1 doubles the range of hi (similar to directory doubling)
Source code:
#include<stdio.h>
int h,r,m,n,l,k,i,j,p,a[10];
int b[10];
int main()
{
printf("\nEnter the array size:");
scanf("%d",&n);
printf("\nEnter the table size:");
scanf("%d",&m);
for(i=0;i<n;i++)
scanf("%d",&a[i]);
for(i=0;i<m;i++)
b[i]=0;
for(j=0;j<n;j++)
{
for(i=0;i<m;i++)
{
l=a[j]%m;
k=(l+i)%m;
if(b[k]==0)
{
b[k]=a[j];
break;
}
}
}
for(i=0;i<m;i++)
printf("\nb[%d]=%d",i,b[i]);
}
OUTPUT:
Collision resolution:
Problem: Obviously, a mapping from a potentially huge set of strings to a small set of integers
will not be unique. The hash function maps keys into indices in many-to-one fashion. Having a
second key into a previously used slot is called a collision.
Collision resolution: deals with keys that are mapped to same addresses.
Two keys mapping to the same location in the hash table is called “Collision”.
Collisions can be reduced with a selection of a good hash function.
But it is not possible to avoid collisions altogether
unless we can find a perfect hash function
which is hard to do.
Methods:
a. Separate chaining
b. Open addressing
i. Linear probing
ii. Quadratic probing
iii. Double hashing
Separate chaining:
Hash table will have 'n' number of buckets.
To insert a node into the hash table, we need to find the hash index for the given key.
And it could be calculated using the hash function.
Example: hashIndex = key % noOfBuckets
Move to the bucket corresponds to the above calculated hash index and insert the new
node at the end of the list.
To delete a node from hash table, get the key from the user, calculate the hash index,
move to the bucket corresponds to the calculated hash index, search the list in the current
bucket to find and remove the node with the given key. Finally, remove the node with
given key, if it is present.
Hash table with 5 buckets. 0, 1, 2, 3 and 4 are the hash indexes
+-------+
| 0 |
+-------+
| 1 |
+-------+
| 2 |
+-------+
| 3 |
+-------+
| 4 |
+-------+
+-------+
| 0 |
+-------+ ----------------- ------------------
| 1 |---->| 21 | data| -|---->| 31 | data | -|----->X
+-------+ ----------------- -----------------
| 2 |
+-------+ ------------------
| 3 |--->| 33 | data| |--->X
+-------+ ------------------
| 4 |
+-------+
Let's take a simple example. First, we start with a hash table array of strings (we'll use strings
as the data being stored and searched in this example). Let's say the hash table size is 12:
Figure %: The empty hash table of strings
Next we need a hash function. There are many possible ways to construct a hash function. We'll
discuss these possibilities more in the next section. For now, let's assume a simple hash function
that takes a string as input. The returned hash value will be the sum of the ASCII characters that
make up the string mod the size of the table:
Let's try another string: "Spark". We run the string through the hash function and find that
hash("Spark",12) yields 6. Fine. We insert it into the hash table:
Figure %: The hash table after inserting "Spark"
Let's try another: "Notes". We run "Notes" through the hash function and find that
hash("Notes",12) is 3. Ok. We insert it into the hash table:
hashTable[hashIndex].count--;
free(myNode);
break;
}
temp = myNode;
myNode = myNode->next;
}
if (flag)
printf("Data deleted successfully from Hash Table\n");
else
printf("Given data is not present in hash Table!!!!\n");
return;
}
void display() {
struct node *myNode;
int i;
for (i = 0; i < eleCount; i++) {
if (hashTable[i].count == 0)
continue;
myNode = hashTable[i].head;
if (!myNode)
continue;
printf("\nData at index %d in Hash Table:\n", i);
printf("VoterID Name Age \n");
printf("--------------------------------\n");
while (myNode != NULL) {
printf("%-12d", myNode->key);
printf("%-15s", myNode->name);
printf("%d\n", myNode->age);
myNode = myNode->next;
}
}
return;
}
int main() {
int n, ch, key, age;
char name[100];
printf("Enter the number of elements:");
scanf("%d", &n);
eleCount = n;
/* create hash table with "n" no of buckets */
hashTable = (struct hash *)calloc(n, sizeof (struct hash));
while (1) {
printf("\n1. Insertion\t2. Deletion\n");
printf("3. Searching\t4. Display\n5. Exit\n");
printf("Enter your choice:");
scanf("%d", &ch);
switch (ch) {
case 1:
printf("Enter the key value:");
scanf("%d", &key);
getchar();
printf("Name:");
fgets(name, 100, stdin);
name[strlen(name) - 1] = '\0';
printf("Age:");
scanf("%d", &age);
/*inserting new node to hash table */
insertToHash(key, name, age);
break;
case 2:
printf("Enter the key to perform deletion:");
scanf("%d", &key);
/* delete node with "key" from hash table */
deleteFromHash(key);
break;
case 3:
printf("Enter the key to search:");
scanf("%d", &key);
searchInHash(key);
break;
case 4:
display();
break;
case 5:
exit(0);
default:
printf("U have entered wrong option!!\n");
break;
}
}
return 0;
}
OUTPUT:
Open Addressing:
Invented by A. P. Ershov and W. W. Peterson in 1957 independently.
Idea: Store collisions in the hash table itself.
The method uses a collision resolution function in addition to the hash functon.
If collision occurs, next probes are performed following the formula:
hi(x) = (hash(x) + f(i)) mod TableSize
where:
hash(x) is the hash function
f(i) is the collision resolution function
i is the number of the current attempt (probe) to insert an element.
00 01 10 11
The second group of keys is split onto two disk blocks - one for keys staring with 010,
and one for keys starting with 011.
A directory is maintained in main memory with pointers to the disk blocks for each bit pattern.
The size of the directory is 2D = O(N(1+1/M)/M), where
D - number of bits considered
N - number of records
M - number of disk blocks.
Conclusion
Hashing is the best search method (constant running time) if we don't need to have the records
sorted.
The choice of the hash function remains the most difficult part of the task and depends very
much on the nature of the keys.
Separate chaining or open addressing?
Open addressing is the preferred method if there is enough memory
to keep a table twice larger than the number of the records.
Separate chaining is used when we don't know in advance the number of the records to
be stored. Though it requires additional time for list processing, it is simpler to
implement.
Some application areas
Dictionaries, on-line spell checkers, compiler symbol tables.
Closed Hashing - Linear Probing
Linear Probing resolves hash collision(same hash value for two or more data).
It allows user to get the free space by searching the hash table sequentially.
To insert an element into the hash table, we need to find the hash index from the given key.
Example: hashIndex = key % tableSize (hash table size)
If the resultant hash index is already occupied by another data, we need to do linear probing to
find a free space in hash table.
Example: hashIndex = (key + i) % tableSize where i = 0,1,2...
To delete an element from the hash table, we need to calculate the hash index from the given
key.
hashIndex = key % tableSize
If the given key is not available at the resultant hash index, we need to probe forward until we
encounter the given key or the value '0' for marker.
If the marker of any bucket is 0, then the given data is not present in the hash table. If we
encounter the given key in the hash table, then delete it and set the marker value to -1.
We are getting collision for data 31 because index 1 is already occupied by 21. So, we need to
check the next available location by doing linear probing.
Hash index 2 is also occupied already. So, we need to check for the next available location.
hashIndex for 31 = (31+2) % 5 = 3.
Hash index 3 is not occupied. So, we can insert key 31 in the third bucket.
Suppose if user deletes 32 and then he wants to delete 31. We will get hash index as 1 for the
key 31. But, the bucket at hash index 1 holds the data 21. So, we will check the next bucket((31
+1) % 5 = 2nd bucket) and it would be empty. So, we might think that either the key 31 is not
present or it's already gone. In order to avoid that, we will set the marker to -1 which indicates
that the data we search might be present in the subsequent buckets.
struct node {
int age, key;
char name[100];
int marker;
};
if (flag)
printf("Given data deleted from Hash Table\n");
else
printf("Given data is not available in Hash Table\n");
return;
}
if (!flag)
printf("Given data is not present in hash table\n");
return;
}
void display() {
int i;
if (totEle == 0) {
printf("Hash Table is Empty!!\n");
return;
}
printf("Voter ID Name Age Index \n");
printf("-----------------------------------------\n");
for (i = 0; i < tableSize; i++) {
if (hashTable[i].marker == 1) {
printf("%-13d", hashTable[i].key);
printf("%-15s", hashTable[i].name);
printf("%-7d", hashTable[i].age);
printf("%d\n", i);
}
}
printf("\n");
return;
}
int main() {
int key, age, ch;
char name[100];
printf("Enter the no of elements:");
scanf("%d", &tableSize);
hashTable = (struct node *)calloc(tableSize, sizeof(struct node));
while (1) {
printf("1. Insertion\t2. Deletion\n");
printf("3. Searching\t4. Display\n");
printf("5. Exit\nEnter ur choice:");
scanf("%d", &ch);
switch (ch) {
case 1:
printf("Enter the key value:");
scanf("%d", &key);
getchar();
printf("Name:");
fgets(name, 100, stdin);
name[strlen(name) - 1] = '\0';
printf("Age:");
scanf("%d", &age);
insertInHash(key, name, age);
break;
case 2:
printf("Enter the key value:");
scanf("%d", &key);
deleteFromHash(key);
break;
case 3:
printf("Enter the key value:");
scanf("%d", &key);
searchElement(key);
break;
case 4:
display();
break;
case 5:
exit(0);
default:
printf("U have entered wrong Option!!\n");
break;
}
}
return 0;
}
Quadratic Probing:
Quadratic probing insertion
The problem, here, is to insert a key at an available key space in a given Hash Table using
quadratic probing.
Algorithm to insert key in hash table
1. Get the key k
2. Set counter j = 0
3. Compute hash function h[k] = k % SIZE
4. If hashtable[h[k]] is empty
(4.1) Insert key k at hashtable[h[k]]
(4.2) Stop
Else
(4.3) The key space at hashtable[h[k]] is occupied, so we need to find the next available key
space
(4.4) Increment j
(4.5) Compute new hash function h[k] = ( k + j * j ) % SIZE
(4.6) Repeat Step 4 till j is equal to the SIZE of hash table
5. The hash table is full
6. Stop
int j = 0, hk;
hk = key % SIZE;
while(j < SIZE) {
if(empty[hk] == 1){
hashtable[hk] = key;
empty[hk] = 0;
return (hk);
}
j++;
hk = (key + j * j) % SIZE;
}
return (-1);
}
Quadratic probing search
Algorithm to search element in hash table
1. Get the key k to be searched
2. Set counter j = 0
3. Compute hash function h[k] = k % SIZE
4. If the key space at hashtable[h[k]] is occupied
(4.1) Compare the element at hashtable[h[k]] with the key k.
(4.2) If they are equal
(4.2.1) The key is found at the bucket h[k]
(4.2.2) Stop
Else
(4.3) The element might be placed at the next location given by the quadratic function
(4.4) Increment j
(4.5) Compute new hash function h[k] = ( k + j * j ) % SIZE
(4.6) Repeat Step 4 till j is greater than SIZE of hash table
5. The key was not found in the hash table
6. Stop
int j = 0, hk;
hk = key % SIZE;
while(j < SIZE)
{
if((empty[hk] == 0) && (hashtable[hk] == key))
return (hk);
j++;
hk = (key + j * j) % SIZE;
}
return (-1);
}
Double hashing is popular hashing technique where the interval between probes is
calculated by another hash function.
It avoids hash collision (two or more data with same hash value).
if (totElements == 0) {
return;
if (hashBucket[hashInd].key == key) {
hashBucket[hashInd].key = 0;
hashBucket[hashInd].marker = -1;
hashBucket[hashInd].age = 0;
strcpy(hashBucket[hashInd].name, "\0");
totElements--;
flag = 1;
break;
count++;
if (flag)
else
return;
if (totElements == 0) {
return;
if (hashBucket[hashInd].key == key) {
flag = 1;
break;
}
if (!flag)
return;
void display() {
int i;
if (totElements == 0) {
return;
printf("-----------------------------------------\n");
if (hashBucket[i].marker == 1) {
printf("%-13d", hashBucket[i].key);
printf("%-15s", hashBucket[i].name);
printf("%-7d", hashBucket[i].age);
printf("%d\n", i);
printf("\n");
return;
int main() {
int key, age, ch, i, flag = 0;
char name[100];
scanf("%d", &tableSz);
while (1) {
if (tableSz % i == 0) {
flag = 1;
break;
break;
flag = 0;
tableSz++;
while (1) {
scanf("%d", &ch);
switch (ch) {
case 1:
printf("Enter the key value:");
scanf("%d", &key);
getchar();
printf("Name:");
name[strlen(name) - 1] = '\0';
printf("Age:");
scanf("%d", &age);
break;
case 2:
scanf("%d", &key);
deleteFromHashTable(key);
break;
case 3:
scanf("%d", &key);
searchData(key);
break;
case 4:
display();
break;
case 5:
exit(0);
default:
return 0;
Collision resolution Using external data structure Using hash table itself