56 views

Uploaded by Latif Siddiq Sunny

Hashing is a well known data structure for searching data in O(1) complexity.

Hashing is a well known data structure for searching data in O(1) complexity.

© All Rights Reserved

- DynamoDB
- 14 ES26 Lab - Multi-Dimensional Arrays
- Locks and Deadlocks (in Teradata)
- Distinct Values Estimation Over Data Streams PBGibbons
- 04162518
- 16.hashtables
- Daniel Lemire and Owen Kaser, Recursive Hashing and One-Pass, One-Hash n-Gram Count Estimation
- 0478 Pseudocode Guide
- hash
- Java Collections Framework
- DS 7 5 Quest Ansrs Tuning
- Spin 3spin
- DOI:10.5121/ijcsa.2015.5402 19 K-Mer Index Of DNA Sequence Based On Hash Algorithm
- Function Based Index
- Fast Matlab Code
- Hashing Algorithm
- COMPUSOFT, 3(7), 1020-1023.pdf
- Table Lookup
- Computer Science (2016-17) Set SQP
- A HACKER is the Person Who Knows the LIFE and CODING Very Well

You are on page 1of 10

DECEMBER 1, 2017

LATIF SIDDIQ SUNNY

Page |1

Hashing

Hashing is a technique that is used to uniquely identify a specific object from a group of similar objects.

Suppose, we have a large table of data. In this data table, we want to insert, remove, and search data.

If we use sorted arrays and keep the data sorted, then a data can be searched in O(log(n)) time using

Binary Search, but remove operations becomes costly as we have to maintain sorted order.

If we use sorted/unsorted linked-list, insert, remove and search operations become costly.

Page |2

With Balanced Binary Search Tree (For example, AVL Tree, Red Black Tree), we get moderate search,

insert, and delete times. These operations can be guaranteed to be in O(log(n)) time.

Having an insertion, find and removal of O(log(n)) is good but as the size of the table becomes larger,

even this value becomes significant. We would like to be able to use an algorithm for finding of O (1). In

this case, we have to use Hashing.

So, hashing is a technique when we have insertion and search dominate operations, it helps to insert

data and search them in O (1) complexity. Using this technique, we store data in Hash table.

Hash Function

A hash function maps a big number or string to a small integer that can be used as index in hash table.

1. Efficiently computable.

2. Should uniformly distribute the keys (Each table position equally likely for each key)

Page |3

The hash function is used to map the search key to a list; the index gives the place in the hash table

where the corresponding record should be stored and where the data should be found.

Hashing Techniques

We can use a Direct Access Table for hashing. We build a large array and use the following hashing

function,

If space is not a concern, we can build such Direct Access Table. If T [1,2….m] is our table where m is the

highest key we can store, we can do insert, remove and search operation in O (1) time complexity.

Advantage:

Disadvantage:

2. We can not store a large value as we have limitation to have a huge sized array.

As we have limitation to have a large array, we can use a small array and modify the hash function.

𝒉(𝒌) = 𝒌%𝒎 , 𝒘𝒉𝒆𝒓𝒆 𝒌 𝒊𝒔 𝒕𝒉𝒆 𝒌𝒆𝒚 𝒂𝒏𝒅 𝒎 𝒊𝒔 𝒕𝒉𝒆 𝒔𝒊𝒛𝒆 𝒐𝒇 𝒕𝒉𝒆 𝒂𝒓𝒓𝒂𝒚

But there is a chance of collision of data. For example,

m=7, if we insert 6, the hash value of 6 is 6. So, we insert 6 at the 6th position of the array. Then if we

insert 13, the hash value of 13 is 13%7= 6, but this place/ slot is not empty, a collision occurs. Though

we overcome the limitation of large size array, we can not avoid such collision.

Separate chaining

In this method we use same hash function described above, but this time the array should be an array of

pointer head of a linked-list. If there is no data in a slot, the head should be null. Whenever, we get a

data in a slot, we should insert the data at the end of that linked-list.

Page |4

In this system, we avoid collision of data, but we have to search data in a linear approach.

Advantages:

1. Simple to implement.

2. Hash table never fills up, we can always add more elements to chain.

3. Less sensitive to the hash function or load factors.

4. It is mostly used when it is unknown how many and how frequently keys may be inserted or

deleted.

Disadvantages:

1. Cache performance of chaining is not good as keys are stored using linked list. Open addressing

provides better cache performance as everything is stored in same table.

2. Wastage of Space (Some Parts of hash table are never used)

3. If the chain becomes long, then search time can become O(n) in worst case.

4. Uses extra space for links.

Page |5

Analysis:

As the length in every chain is not equal, so we take average expected value.

And the length function,

𝒍(𝒙) = ∑(𝒄𝒙,𝒚 ) , 𝒙 𝒂𝒏𝒅 𝒚 𝒃𝒐𝒕𝒉 𝒊𝒏 𝑻 𝒂𝒏𝒅 𝑻 𝒊𝒔 𝒕𝒉𝒆 𝒔𝒆𝒕 𝒐𝒇 𝒆𝒍𝒆𝒎𝒆𝒏𝒕 𝒊𝒏 𝒕𝒉𝒆 𝒔𝒍𝒐𝒕 𝒐𝒇 𝒕𝒉𝒆 𝒕𝒂𝒃𝒍𝒆

𝒚∊𝑻

𝑬(𝒍(𝒙))

= 𝑬 ( ∑(𝒄𝒙,𝒚 ) )

𝒚∊𝑻

= ( ∑ 𝑬(𝒄𝒙,𝒚 ) )

𝒚∊𝑻

Now, 𝑬( 𝒄𝒙,𝒚 )

= 𝟏 ∗ 𝑷(𝒄𝒙,𝒚 = 𝟏) + 𝟎 ∗ (𝒄𝒙,𝒚 = 𝟎)

= 𝟏 ∗ 𝑷(𝒉(𝒙) = 𝒉(𝒚))

𝟏

=

𝒎

𝟏

𝑷(𝒉(𝒙) = 𝒉(𝒚)) = , 𝑷𝒓𝒐𝒃𝒂𝒃𝒊𝒍𝒊𝒕𝒚 𝒐𝒇 𝒈𝒆𝒕 𝒔𝒂𝒎𝒆 𝒔𝒍𝒐𝒕 𝒐𝒇 𝒉𝒂𝒗𝒊𝒏𝒈 𝒔𝒂𝒎𝒆 𝒉𝒂𝒔𝒉 𝒗𝒂𝒍𝒖𝒆

𝒎

So, now 𝑬(𝒍(𝒙))

𝟏 𝒏

= ∑( ) = , 𝒏 𝒊𝒔 𝒕𝒉𝒆 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒆𝒍𝒆𝒎𝒆𝒏𝒕𝒔 𝒊𝒏 𝑻

𝐦 𝒎

𝒚∊𝑻

= α, Load Factor

Page |6

Search Complexity

𝑛−1

𝟏 𝑖

= 𝒏

∑ (1 + 𝑚)

0

1 𝑛(𝑛−1)

= 𝑛(n+ 2𝑚

)

(𝑛−1)

= (1+ )

2𝑚

(𝑛)

< (1+2𝑚)

𝛼

= (1+2 )

If the search is unsuccessful then n=m. So, α=1. In this moment search complexity 0(c), c is a constant.

Open Addressing

Open addressing, or closed hashing, is a method of collision resolution in hash tables. With this method

a hash collision is resolved by probing, or searching through alternate locations in the array (the probe

sequence) until either the target record is found, or an unused array slot is found, which indicates that

there is no such key in the table.

Insert(k): Keep probing until an empty slot is found. Once an empty slot is found, insert k.

Search(k): Keep probing until slot’s key doesn’t become equal to k or an empty slot is reached.

Delete(k): If we simply delete a key, then search may fail. So, slots of deleted keys are marked specially

as “deleted”.

Insert can insert an item in a deleted slot, but search doesn’t stop at a deleted slot.

Linear Probing

let h(x) be the slot index computed using hash function and S be the table size

If (h (x) + 1) % S is also full, then we try (h (x) + 2) % S

………………………………………………………………………………………..

Page |7

𝒏

P[h(k) is occupied] = 𝒎

𝒏

So, E [T (m, n)] = 1+ * E [T (m-1, n-1)],

𝒎

1 is for hashing and E [ T (m-1, n-1)] is when current slot is filled, then there is n-1 element in m-1 sized

array.

𝒏 𝒎−𝟏

≤ 1+𝒎 ∗ (𝒎−𝟏)−(𝒏−𝟏)

𝒏

< 1+(𝒎−𝒏)

𝒎

=(𝒎−𝒏)

𝟏

= 𝒏

𝟏−

𝒎

𝟏

=𝟏−𝜶

=1+α+ α2 + α3 + α4+….

Page |8

The main problem with linear probing is clustering, many consecutive elements form groups and it

starts taking time to find a free slot or to search an element.

Quadratic Probing

In this probing, h (x, i) = (x+ i*i) %m is used, where i is the number of attempt.

..................................................

..................................................

Assume, first ceil(m/2) probes are not unique. ith and jth probe to the same location and i<j<ceil(m/2).

(h(k)+i*i) =(h(k)+j*j) %m

i*i=j*j %m

i*i-j*j=0 %m

(i+j) (i-j) =0 %m

As, m is a prime. So, (i+j) or (i-j) are not divisible by m as i<j<ceil(m/2) <m

If there is m sized array, m! sequence can be possible. By linear and quadratic probing, we can get m

sequence.

Page |9

Double Hashing

In this probing, hash (x) = (x+ i*hash2(x)) %m is used, where i is the number of attempt.

..................................................

..................................................

Double hashing requires more computation time as two hash functions need to be computed.

Advantages of Chaining:

1. Chaining is Simpler to implement.

2. In chaining, Hash table never fills up, we can always add more elements to chain. In open

addressing, table may become full.

3. Chaining is Less sensitive to the hash function or load factors.

4. Chaining is mostly used when it is unknown how many and how frequently keys may be inserted

or deleted.

5. Open addressing requires extra care for to avoid clustering and load factor.

Advantages of Open Addressing:

1. Cache performance of chaining is not good as keys are stored using linked list. Open addressing

provides better cache performance as everything is stored in same table.

2. Wastage of Space (Some Parts of hash table in chaining are never used). In Open addressing, a

slot can be used even if an input doesn’t map to it.

3. Chaining uses extra space for links.

Perfect Hashing

In this hashing, we can insert, remove and search in O (1) complexity in worst case. It can be possible if

we have some domain knowledge about data.

Actually, it uses same idea of double hashing. Whenever there can be a collision, step hashing function

gives an unique slot to search the data in O(1) complexity.

- DynamoDBUploaded byDiana Roxana
- 14 ES26 Lab - Multi-Dimensional ArraysUploaded byWilmarc
- Locks and Deadlocks (in Teradata)Uploaded byshyama_p
- Distinct Values Estimation Over Data Streams PBGibbonsUploaded bydarkprince117
- 04162518Uploaded bySudhakar Spartan
- 16.hashtablesUploaded byRocket Fire
- Daniel Lemire and Owen Kaser, Recursive Hashing and One-Pass, One-Hash n-Gram Count EstimationUploaded byDaniel Lemire
- 0478 Pseudocode GuideUploaded byStevie
- hashUploaded byapi-268839910
- Java Collections FrameworkUploaded bykishoreramana
- DS 7 5 Quest Ansrs TuningUploaded byGeorge E. Coles
- Spin 3spinUploaded bynereu
- DOI:10.5121/ijcsa.2015.5402 19 K-Mer Index Of DNA Sequence Based On Hash AlgorithmUploaded byAnonymous lVQ83F8mC
- Function Based IndexUploaded byfrankcc
- Fast Matlab CodeUploaded bytaasshz
- Hashing AlgorithmUploaded byNaveen Subramani
- COMPUSOFT, 3(7), 1020-1023.pdfUploaded byIjact Editor
- Table LookupUploaded bymaurya1012
- Computer Science (2016-17) Set SQPUploaded byAkhiJarodia
- A HACKER is the Person Who Knows the LIFE and CODING Very WellUploaded byAVINANDANKUMAR
- abap iqsUploaded byKolli Naga Bhushan
- LabVIEW Graphs, Charts, Arrays and ClustersUploaded byrobert
- IEEE Format ReportUploaded byRuther Gene Casison
- Array_LabUploaded byMuhd Danish
- c Lecture 07Uploaded bykrishna
- Decoupling the Partition Table From the Producer-Consumer Problem in 16 Bit ArchitecturesUploaded bydjclocks
- 2.DataTypes Control Statements OperatorsUploaded byKolachalama Mythili
- PROG2 HandoutsUploaded byMichelle Ann Guinucud
- ContainersUploaded byMahesh Kommuru
- Forceibly Loading SqlplanUploaded byVinod Kumar

- Operator Precedence ParsingUploaded byk_suganthivasu
- Improvised Steganography Technique Using LSB and RC4 for IOT ApplicationsUploaded byInternational Journal for Scientific Research and Development - IJSRD
- CISE301-Topic8L8&9Uploaded byLui
- IRJET-An Efficient and Secure Video Encryption Technique for Real Time SystemsUploaded byIRJET Journal
- Lattice reduction aided MIMO PrecodingUploaded bymehdimajidi797144
- Synopsis of m. Tech. Thesis on Face Detection Using Neural Network in Matlab by Lalita GurjariUploaded byLinguum
- Improvement in the Performance of Online Control Applications and OptimizeIT ABB Suite for APCUploaded byquinteroudina
- TSE Using Matrices Rev2Uploaded byBouhadjar Meguenni
- CalCurveComparisons_2013.pdfUploaded byBrian Antonio Perez Balarezo
- Nesterov CD 2012Uploaded bybzsahil
- Newman Et Al 2001Uploaded byAleksandar Tomašević
- crypto_outreach1_2Uploaded bySlow Hand
- Reinforcement Learning 2016sep.pdfUploaded byYu Yuan
- Linear Optimisation 3Uploaded byRaggy
- Ward equivalent.pdfUploaded byArindam Mitra
- Secure Data Storage on Cloud System for Privacy PreservingUploaded byAnonymous kw8Yrp0R5r
- Introduction Electrodynamics Griffiths SolutionsUploaded byKishore Kumar
- lampiran spss wawanUploaded bysuprihatin 12
- unit 3Uploaded byAfsana Mohammad
- Graphs Breadth First SearchUploaded bymikimaric
- 17D38101 Error Control CodingUploaded bySreekanth Pagadapalli
- Day 1 RecursionUploaded byTwix
- RouthUploaded byjqsolis
- Computer Practical FileUploaded byShubham Birange
- Tutorial Sol Ch 4Uploaded byAndri Wulan Karindra
- Sequential Gaussian SimulationUploaded byParag Jyoti Dutta
- A List of ORFE Graduate Course DescriptionsUploaded byspitzersglare
- TechRef ClockUploaded byMarcos Gonzales
- Adaptive+Strategies+for+HFTUploaded byJonathan Ludwig
- Constrained Hamilton Ian System - Hanson-ReggeUploaded byafael