Hashing Part 1 Lecture

Hashing
Lec12- spring 2019

PRESENTER BY: DR EMAD NABIL
Lecture Overview
• Introduction to Hashing
• Hash functions
• Distribution of records among addresses
• synonyms and collisions
• Collision resolution by progressive overflow or linear probing
Motivation
• Hashing is a useful searching technique which can be used for implementing
indexes.
• The main motivation for Hashing is improving searching time.
• Below we show how the search time for Hashing compares to the one for
other methods
• using binary search O(log2 N)
• Hashing O(1)
What is Hashing ?
• The idea is to discover the location of a key by simply examining the
key.
• For that we need to design a hash function.
• A Hash Function is:

• a function h(k) that transforms a key into an address.
key f address
• An address space is chosen before hand.
• For example we may decide the file will have 1000 available addresses.
• If U is the set of all possible keys, the hash function is from U to

{0,1,...,999}
Address space
0 999
Hashing Example
Collision
• LOWELL, LOCK, OLIVER, and any word with first two letters L and O will be
mapped to the same address
h(LOWELL) = h(LOCK) = h(OLIVER) = 4
• These keys are called synonyms
• The address "4" is said to be the home address of any of these keys.
• Two different keys may be sent to the same address generating a

Collision
Collision resolution
• Avoiding collisions is extremely difficult
• Ways of reducing collisions
1. Spread out the records by choosing a good hash function
2. Use extra memory, i.e. increase the size of the address space.
ex: reserve 5000 available addresses rather than 1000
3. Put more than one record at a single address use of buckets

 Addresses generated from the key are uniformly and randomly distributed
 The hashing function must minimize the collision
8
1. Division Method
2. Multiplication Method
3. Extraction Method
4. Mid-Square Hashing
5. Folding Technique
6. Rotation
7. Universal Hashing
9
 One of the required features of the hash
function is that the resultant index must
be within the table index range
 One simple choice for a hash function is

to use the modulus division indicated as
MOD (the operator % in C/C++)
 The function returns an integer
 If any parameter is NULL, the result is

NULL
 Hash(Key) = Key % m m=307
m is the hash table size
10
 The multiplication method works as:
1. Multiply the key ‘Key’ K by a constant A in the range 0 < A < 1
2. extract the fractional part of k*A
3. Multiply this value by m and take the floor of the result.
where ( kA mod 1) denotes the fractional part of kA  kA−floor(kA) .

The optimal choice of A depends on the characteristics of the data being hashed. Knuth recommends
m is the hash table size

11
2. Multiplication Method Example
 When a portion of the key is used for the address calculation, the
technique is called as the extraction method
 In digit extraction, few digits are selected and extracted from the
key which are used as the address
Key Hashed Address

345678 357
234137 243
952671 927
13
 The mid-square hashing suggests to take square of the key and extract the middle digits of the
squared key as address
 The difficulty is when the key is large. As the entire key participates in the address calculation,
if the key is large, then it is very difficult to store the square of it as the square of key should not
exceed the storage limit
 So mid-square is used when the key size is less than or equal to 4 digits
Key Square Hashed Address

2341 5480281 802
1671 2792241 922
The difficulty of storing larger numbers square can be overcome if for squaring
we use few of digits of key instead of the whole key 14
We can select a portion of key if key is larger in size and then square the portion of it
Keys and addresses using extracting few digits, squaring them, and again extracting mid
Key Square Hashed

Address
234137 234 x 234 = 027889 788
567187 567 x 567 = 321489 148
15
5-
1
1
 the size of subparts of key could be as that of the address

5-Folding Technique
To compute this hash function apply 3 steps
• Step 1. Transform the key into a number
• Step 2. Fold and add and take the mod by a prime number
• Step 3. Divide by the size of the address space (preferably a prime number.)
• dividing by a number that has many small factors may result in lots of collisions.
 When keys are serial, they vary in only last digit and this leads to the creation of synonyms
 Rotating key would minimize this problem. This method is used along with other methods
 Here, the key is rotated right by one digit and then use of folding would avoid synonym
 For example, let the key be 120605, when it is rotated we get 512060
 Then further the address is calculated using any other hash function
19
Some Other Hashing Methods
• Radix Transformation
• Transform the number into another base and then divide
by the maximum address.
If Hash(Key1) = Hash(Key2)
then
Key1 and Key2
are
synonyms
and collision happens
Consider the hash value is the RRN and
we working on fixed length records 21
Distribution of Records among Addresses
• Uniform distributions are extremely rare.

• Random distributions are acceptable and more easily obtainable.
1. Open addressing
The first collision resolution method, open addressing, resolves collisions in the home area. When a collision
occurs, the home area addresses are searched for an open or unoccupied element where the new data can be
placed. Examples of Open Addressing Methods:
1.1. Linear probing or progressive overflow
1.2. Quadratic probing
1.3. Double hashing
2. Bucket hashing (defers collision but does not prevent it)

3. Separate chaining
4. Separate chaining with overflow area
23
1.1. open addressing
Progressive Overflow or linear probing
H = F(key)
is the home address. If it is available we store the record, otherwise, we increase H by k,
H = (H + k) mod tableSize, (k ≥1)
Collision Resolution: Progressive Overflow
Any
suggestion !!
Collision Resolution: Progressive Overflow
• Advantage
• Simplicity
• Disadvantage
• If there are lots of collisions, clusters of records can form as in the previous
example.
1.2. Quadratic Probing
H = F(key)
H= (H + i2 )% tablesize, i≥ 𝟏
• Quadratic Probe If there is a collision at hash address H,

• this method probes/explores the table at locations h+1, h+4, h+9, ...,
• that is, at locations H + i^2 (mod tablesize) for i = 1, 2, ....
• That is, the increment function is i^2.
• Quadratic probing substantially reduces clustering, but it will not probe/explore all
locations in the table.
1.3. Double Hashing
6+4 = 10
1.3. Double Hashing
H =F1(key)  to compute the home address

Step =F2(key)
Table size =M
H = (H + i*step)%M, i>= 0repeat this until we find a place or we find the start point again.
Double hashing represents an improvement over linear or quadratic probing
Double Hashing uses nonlinear probing by computing different probe increments for different keys.
It uses two functions.
The first function computes the original address, if the slot is available (or the record is found) we stop there,
otherwise, we apply the second hashing function to compute the step value.
1. Open addressing
The first collision resolution method, open addressing, resolves collisions in the home area. When a collision
occurs, the home area addresses are searched for an open or unoccupied element where the new data can be
placed. Examples of Open Addressing Methods:
1.1. Linear probing or progressive overflow
1.2. Quadratic probing
1.3. Double hashing
2. Bucket hashing (defers collision but does not prevent it)

3. Separate chaining
4. Separate chaining with overflow area
34

Hashing Part 1 Lecture

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Hashing Part 1 Lecture

Загружено:

Авторское право:

Доступные форматы

Hashing

Lec12- spring 2019

• The main motivation for Hashing is improving searching time.

• A Hash Function is:

• If U is the set of all possible keys, the hash function is from U to

• Two different keys may be sent to the same address generating a

3. Put more than one record at a single address use of buckets

 The hashing function must minimize the collision

 One simple choice for a hash function is

 The function returns an integer

 If any parameter is NULL, the result is

where ( kA mod 1) denotes the fractional part of kA  kA−floor(kA) .

m is the hash table size

Key Hashed Address

Key Square Hashed Address

Key Square Hashed

 the size of subparts of key could be as that of the address

• Uniform distributions are extremely rare.

2. Bucket hashing (defers collision but does not prevent it)

• Quadratic Probe If there is a collision at hash address H,

H =F1(key)  to compute the home address

Double hashing represents an improvement over linear or quadratic probing

2. Bucket hashing (defers collision but does not prevent it)

Вам также может понравиться