17 views

Uploaded by Nguyễn Thị Hoài Thu

- Hashing
- Data Structure Notes
- CompleteJava&J2EE_sateeesh
- Distributed Perfect Hashing for Very Large Key
- An improvised tree algorithm for association rule mining using transaction reduction
- Symbol Tables
- Visual c++ and Mfc Windows Programming
- Prototype 150 API
- Notification Center
- hashing.ppt
- Hashing
- Python 101
- Algorithm Analysis Design Lecture5 1 PowerPoint Presentation
- assoc-rules1
- Advance DBMS 1
- lect-Hash
- TR741.pdf
- p2289-pillutla
- UnityJDBC - SQL to Mongo Translation
- [IJCST-V4I4P14]:Ms. Tanuja Kiran Rajput, Prof. Dr. B. D. Phulpagar

You are on page 1of 16

What is it?

Hashing 1

Problem

• RT&T is a large phone company, and they want to

provide enhanced caller ID capability:

- given a phone number, return the caller’s name

- phone numbers are in the range 0 to R = 1010−1

- n is the number of phone numbers used

- want to do this as efficiently as possible

• We know two ways to design this dictionary:

- a balanced search tree (AVL, red-black) or a skip-

list with the phone number as the key has O(log n)

query time and O(n) space --- good space usage

and search time, but can we reduce the search time

to constant?

- a bucket array indexed by the phone number has

optimal O(1) query time, but there is a huge

amount of wasted space: O(n + R)

000-000-0000 000-000-0001 ... 401-863-7639 ... 999-999-9999

Hashing 2

Another Solution

• A Hash Table is an alternative solution with O(1)

expected query time and O(n + N) space, where N is

the size of the table

• Like an array, but with a function to map the large

range of keys into a smaller one

- e.g., take the original key, mod the size of the

table, and use that as an index

• Insert item (401-863-7639, Roberto) into a table of

size 5

- 4018637639 mod 5 = 4, so item (401-863-7639,

Roberto) is stored in slot 4 of the table

401-

863-7639

Roberto

0 1 2 3 5

• A lookup uses the same process: map the key to an

index, then check the array cell at that index

• Insert (401-863-9350, Andy)

• And insert (401-863-2234, Devin). We have a

collision!

Hashing 3

Collision Resolution

• How to deal with two keys which map to the same

cell of the array?

• Use chaining

- Set up lists of items with the same index

0

1

2

3

4

O(n/N), provided the indices are uniformly

distributed

• The performance of the data structure can be fine-

tuned by changing the table size N

Hashing 4

From Keys to Indices

• The mapping of keys to indices of a hash table is

called a hash function

• A hash function is usually the composition of two

maps:

- hash code map: key → integer

- compression map: integer → [0, N − 1]

• An essential requirement of the hash function is to

map equal keys to equal indices

• A “good” hash function minimizes the probability of

collisions

• Java provides a hashCode() method for the Object

class, which typically returns the 32-bit memory

address of the object.

• This default hash code would work poorly for Integer

and String objects

• The hashCode() method should be suitably redefined

by classes.

Hashing 5

Popular Hash-Code Maps

• Integer cast: for numeric types with 32 bits or less,

we can reinterpret the bits of the nuber as an int

• Component sum: for numeric types with more than

32 bits (e.g., long and double), we can add the 32-bit

components.

• Polynomial accumulation: for strings of a natural

language, combine the character values (ASCII or

Unicode) a0a1 ... an−1 by viewing them as the

coefficients of a polynomial:

a0 + a1x + ...+ xn−1an−1

- The polynomial is computed with Horner’s rule,

ignoring overflows, at a fixed value x:

a0 + x (a1+ x (a2+ ... x (an−2+ x an−1) ... ))

- The choice x = 33, 37, 39, or 41gives at most 6

collisions on a vocabulary of 50,000 English

words

• Why is the component-sum hash code bad for

strings?

Hashing 6

Popular Compression Maps

• Division: h(k) = |k| mod N

- the choice N = 2k is bad because not all the bits are

taken into account

- the table size N is usually chosen as a prime

number

- certain patterns in the hash codes are propagated

• Multiply, Add, and Divide (MAD):

h(k) = |ak + b| mod N

- eliminates patterns provided a mod N ≠ 0

- same formula used in linear congruential (pseudo)

random number generators

Hashing 7

More on Collisions

• A key is mapped to an already occupied table

location

- what to do?!?

• Use a collision handling technique

• We’ve seen Chaining

• Can also use Open Addressing

- Double Hashing

- Linear Probing

legal probe

Hashing 8

Linear Probing

• If the current location is used, try the next table

location

linear_probing_insert(K)

if (table is full) error

probe = h(K)

probe = (probe + 1) mod M

table[probe] = K

slot is found

• Uses less memory than chaining

- don’t have to store all those links

• Slower than chaining

- may have to walk along table for a long way

• Deletion is more complex

- either mark the deleted slot

- or fill in the slot by shifting some elements down

Hashing 9

Linear Probing Example

• h(k) = k mod 13

• Insert keys:

18 41 22 44 59 32 31 73

5 2 9 5 7 6 5 8

0 1 23 45 6 7 8 9 10 11 12

Hashing 10

Linear Probing Example (cont.)

41 18 44 59 32 22 31 73

0 1 23 45 6 7 8 9 10 11 12

Hashing 11

Double Hashing

• Use two hash functions

• If M is prime, eventually will examine every

position in the table

double_hash_insert(K)

if(table is full) error

probe = h1(K)

offset = h2(K)

probe = (probe + offset) mod M

table[probe] = K

• Many of same (dis)advantages as linear probing

• Distributes keys more uniformly than linear probing

does

Hashing 12

Double Hashing Example

• h1(K) = K mod 13

h2(K) = 8 - K mod 8

- we want h2 to be an offset to add

18 41 22 44 59 32 31 73

0 1 23 45 6 7 8 9 10 11 12

Hashing 13

Double Hashing Example (cont.)

44 41 73 18 32 59 31 22

0 1 23 45 6 7 8 9 10 11 12

Hashing 14

Theoretical Results

• Let α = Ν/Μ

- the load factor: average number of keys per array

index

• Analysis is probabilistic, rather than worst-case

Chaining 1+α α

1 + ---

2

1 1 1 1

Linear Probing --- + ------------------------ --- + ---------------------

2 2 2 2(1 – α)

2(1 – α)

1 1 1

Double Hashing ----------------- --- ln ------------

(1 – α) α 1–α

Hashing 15

Expected Number of Probes

vs. Load Factor

Number of Probes

Linear Probing

Double Hashing

Chaining

1.0

α

0.5 1.0

Unsuccessful

Successful

Hashing 16

- HashingUploaded byMohammad Gulam Ahamad
- Data Structure NotesUploaded byKumar Raja
- CompleteJava&J2EE_sateeeshUploaded byamisha2562585
- Distributed Perfect Hashing for Very Large KeyUploaded byapi-25884893
- An improvised tree algorithm for association rule mining using transaction reductionUploaded byATS
- Symbol TablesUploaded byrakeshguptacse
- Visual c++ and Mfc Windows ProgrammingUploaded by123142asdad
- Prototype 150 APIUploaded byArphen Lin
- Notification CenterUploaded bypapatv
- hashing.pptUploaded byNimra Nazeer
- HashingUploaded byNick Hartman
- Python 101Uploaded byAnonymous K11GiGPMDL
- Algorithm Analysis Design Lecture5 1 PowerPoint PresentationUploaded byPedro Ped
- assoc-rules1Uploaded byFrank341234
- Advance DBMS 1Uploaded byfarhadcse30
- lect-HashUploaded byGe Yang
- TR741.pdfUploaded byvkrntk
- p2289-pillutlaUploaded byFernando Mussel
- UnityJDBC - SQL to Mongo TranslationUploaded byHimanshu Sharma
- [IJCST-V4I4P14]:Ms. Tanuja Kiran Rajput, Prof. Dr. B. D. PhulpagarUploaded byEighthSenseGroup
- Indian Rail 2010Uploaded byDisha Hooda
- 7 Useful Tips From Experts In JavaUploaded byinigojoseph

- 104Lec4(ProblemSolving2)Uploaded byerickim916
- hashing.pptUploaded byNimra Nazeer
- 04 HeuristicUploaded byjaijohnk
- HashingUploaded bymrbkiter
- Algorithm Analysis Design Lecture5 1 PowerPoint PresentationUploaded byPedro Ped
- Artificial Intelligence Search Algorithms in Travel PlanningUploaded bySpin Fotonio
- Unit08B.pdfUploaded byKeshavanRavi
- Hashing[1].PptUploaded bym_sandhyarani1
- HashingUploaded bygfdasdfghjkjhgc
- Meljun Cortes Algorithm Space and Time Tradeoffs IIUploaded byMELJUN CORTES, MBA,MPA
- 8. Hash TablesUploaded byAlbi Banushi
- DSAD-Qz2Uploaded byHarshit Sinha
- HashingUploaded bypreethi
- Adams - Hash Joins OracleUploaded byrockerabc123
- Df-pn: Depth-first Proof Number SearchUploaded byChris Nash
- Space-Efficient Search AlgorithmsUploaded by.xml
- HashTableLabUploaded byChenescu Alin
- B.tech CS S8 Artificial Intelligence Notes Module 2Uploaded byJisha Shaji
- 106106126Uploaded byYeshwanth Rangoju
- wk03Uploaded byRavi Tej
- Hash TablesUploaded byPriyank Paliwal
- Unit29_Hashing2Uploaded bySubrat Kumar Mahapatra
- HashingTalk (1)Uploaded bySali Sadegh
- UninfSearchspring07Uploaded byYing Yang
- Gfg Array Questions ImportantUploaded byakg299
- Fa16 Cs188 Lecture 3 Informed SearchUploaded byk2444633
- Assignment 1.pdfUploaded bywivaj wivaj
- HashingUploaded byRamLalli2014
- MasterUploaded byIslam Shaalan
- Andreas TF IDF DemoUploaded byscribdv7r@gishpuppycom