Вы находитесь на странице: 1из 33

Hashing

Lec12- spring 2019


PRESENTER BY: DR EMAD NABIL
Lecture Overview

• Introduction to Hashing
• Hash functions
• Distribution of records among addresses
• synonyms and collisions
• Collision resolution by progressive overflow or linear probing
Motivation
• Hashing is a useful searching technique which can be used for implementing
indexes.

• The main motivation for Hashing is improving searching time.

• Below we show how the search time for Hashing compares to the one for
other methods
• using binary search O(log2 N)
• Hashing O(1)
What is Hashing ?
• The idea is to discover the location of a key by simply examining the
key.
• For that we need to design a hash function.

• A Hash Function is:


• a function h(k) that transforms a key into an address.

key f address
• An address space is chosen before hand.
• For example we may decide the file will have 1000 available addresses.

• If U is the set of all possible keys, the hash function is from U to


{0,1,...,999}
Address space
0 999
Hashing Example
Collision

• LOWELL, LOCK, OLIVER, and any word with first two letters L and O will be
mapped to the same address
h(LOWELL) = h(LOCK) = h(OLIVER) = 4
• These keys are called synonyms
• The address "4" is said to be the home address of any of these keys.

• Two different keys may be sent to the same address generating a


Collision
Collision resolution
• Avoiding collisions is extremely difficult
• Ways of reducing collisions
1. Spread out the records by choosing a good hash function

2. Use extra memory, i.e. increase the size of the address space.
ex: reserve 5000 available addresses rather than 1000

3. Put more than one record at a single address use of buckets


 Addresses generated from the key are uniformly and randomly distributed

 The hashing function must minimize the collision

8
1. Division Method
2. Multiplication Method
3. Extraction Method
4. Mid-Square Hashing
5. Folding Technique
6. Rotation
7. Universal Hashing

9
 One of the required features of the hash
function is that the resultant index must
be within the table index range

 One simple choice for a hash function is


to use the modulus division indicated as
MOD (the operator % in C/C++)

 The function returns an integer

 If any parameter is NULL, the result is


NULL
 Hash(Key) = Key % m m=307
m is the hash table size
10
 The multiplication method works as:
1. Multiply the key ‘Key’ K by a constant A in the range 0 < A < 1
2. extract the fractional part of k*A
3. Multiply this value by m and take the floor of the result.

where ( kA mod 1) denotes the fractional part of kA  kA−floor(kA) .


The optimal choice of A depends on the characteristics of the data being hashed. Knuth recommends

m is the hash table size


11
2. Multiplication Method Example
 When a portion of the key is used for the address calculation, the
technique is called as the extraction method

 In digit extraction, few digits are selected and extracted from the
key which are used as the address

Key Hashed Address


345678 357
234137 243
952671 927

13
 The mid-square hashing suggests to take square of the key and extract the middle digits of the
squared key as address
 The difficulty is when the key is large. As the entire key participates in the address calculation,
if the key is large, then it is very difficult to store the square of it as the square of key should not
exceed the storage limit
 So mid-square is used when the key size is less than or equal to 4 digits

Key Square Hashed Address


2341 5480281 802
1671 2792241 922

The difficulty of storing larger numbers square can be overcome if for squaring
we use few of digits of key instead of the whole key 14
We can select a portion of key if key is larger in size and then square the portion of it

Keys and addresses using extracting few digits, squaring them, and again extracting mid

Key Square Hashed


Address
234137 234 x 234 = 027889 788
567187 567 x 567 = 321489 148

15
5-

1
1

 the size of subparts of key could be as that of the address


5-Folding Technique
To compute this hash function apply 3 steps
• Step 1. Transform the key into a number
• Step 2. Fold and add and take the mod by a prime number

• Step 3. Divide by the size of the address space (preferably a prime number.)
• dividing by a number that has many small factors may result in lots of collisions.
 When keys are serial, they vary in only last digit and this leads to the creation of synonyms
 Rotating key would minimize this problem. This method is used along with other methods
 Here, the key is rotated right by one digit and then use of folding would avoid synonym

 For example, let the key be 120605, when it is rotated we get 512060
 Then further the address is calculated using any other hash function

19
Some Other Hashing Methods
• Radix Transformation
• Transform the number into another base and then divide
by the maximum address.
If Hash(Key1) = Hash(Key2)
then
Key1 and Key2
are

synonyms
and collision happens
Consider the hash value is the RRN and
we working on fixed length records 21
Distribution of Records among Addresses

• Uniform distributions are extremely rare.


• Random distributions are acceptable and more easily obtainable.
1. Open addressing
The first collision resolution method, open addressing, resolves collisions in the home area. When a collision
occurs, the home area addresses are searched for an open or unoccupied element where the new data can be
placed. Examples of Open Addressing Methods:
1.1. Linear probing or progressive overflow
1.2. Quadratic probing
1.3. Double hashing

2. Bucket hashing (defers collision but does not prevent it)


3. Separate chaining
4. Separate chaining with overflow area

23
1.1. open addressing
Progressive Overflow or linear probing

H = F(key)
is the home address. If it is available we store the record, otherwise, we increase H by k,
H = (H + k) mod tableSize, (k ≥1)
Collision Resolution: Progressive Overflow
Any
suggestion !!
Collision Resolution: Progressive Overflow
• Advantage
• Simplicity
• Disadvantage
• If there are lots of collisions, clusters of records can form as in the previous
example.
1.2. Quadratic Probing
H = F(key)
H= (H + i2 )% tablesize, i≥ 𝟏

• Quadratic Probe If there is a collision at hash address H,


• this method probes/explores the table at locations h+1, h+4, h+9, ...,
• that is, at locations H + i^2 (mod tablesize) for i = 1, 2, ....
• That is, the increment function is i^2.
• Quadratic probing substantially reduces clustering, but it will not probe/explore all
locations in the table.
1.3. Double Hashing

6+4 = 10
1.3. Double Hashing

H =F1(key)  to compute the home address


Step =F2(key)
Table size =M
H = (H + i*step)%M, i>= 0repeat this until we find a place or we find the start point again.

Double hashing represents an improvement over linear or quadratic probing

Double Hashing uses nonlinear probing by computing different probe increments for different keys.
It uses two functions.

The first function computes the original address, if the slot is available (or the record is found) we stop there,
otherwise, we apply the second hashing function to compute the step value.
1. Open addressing
The first collision resolution method, open addressing, resolves collisions in the home area. When a collision
occurs, the home area addresses are searched for an open or unoccupied element where the new data can be
placed. Examples of Open Addressing Methods:
1.1. Linear probing or progressive overflow
1.2. Quadratic probing
1.3. Double hashing

2. Bucket hashing (defers collision but does not prevent it)


3. Separate chaining
4. Separate chaining with overflow area

34

Вам также может понравиться