1.4K views

Uploaded by mrbkiter

Attribution Non-Commercial (BY-NC)

- Collision Resolution doc
- Hashing
- Internet Banking System
- Online Banking System
- Hashing
- A Hash Based Frequent Itemset Mining Using Rehashing
- Cuckoo Hashing
- 10-34Hash+35Apps
- test2fall00
- ABBT06
- Choosing Best Hasing
- CS102-06 Random Access
- lec7
- ADANOTES.docx
- Handout
- 34 Hash Tables
- matlab_-_slides_-_lecture_5_+_sol
- Chapter 8- Hashing
- ch19
- 109search Hash Malik Ch09

You are on page 1of 50

• Basic concepts

• Hash functions

• Collision resolution

• Open addressing

• Linked list resolution

• Bucket hashing

1

Basic Concepts

• Sequential search: O(n) Requiring several

key comparisons

• Binary search: O(log2n) before the target is

found

2

Basic Concepts

• Search complexity:

Size Binary Sequential Sequential

(Average) (Worst Case)

16 4 8 16

50 6 25 50

256 8 128 256

1,000 10 500 1,000

10,000 14 5,000 10,000

100,000 17 50,000 100,000

1,000,000 20 500,000 1,000,000

3

Basic Concepts

• Is there a search algorithm whose

complexity is O(1)?

4

Basic Concepts

• Is there a search algorithm whose

complexity is O(1)?

YES.

5

Basic Concepts

memory

keys addresses

hashing

address 6

Basic Concepts

001 Harry Lee

Key Address 002 Sarah Trapp

005 Vu Nguyen

005

Vu Nguyen

102002 Hash 100 007 Ray Black

107095

Sarah Trapp

111060 100 John Adams

7

Basic Concepts

• Home address: address produced by a hash

function.

home addresses.

8

Basic Concepts

• Synonyms: a set of keys that hash to the

same location.

inserted is already occupied by the

synonym data.

9

Basic Concepts

• Ideal hashing:

– No location collision

– Compact address space

10

Basic Concepts

Insert A, B, C

hash(A) = 9

hash(B) = 9

hash(C) = 17

A

[1] [5] [9] [17]

11

Basic Concepts

Insert A, B, C

hash(A) = 9

hash(B) = 9 B and A

hash(C) = 17 collide at 9

A B

[1] [5] [9] [17]

Collision

Resolution

12

Basic Concepts

Insert A, B, C

hash(A) = 9

hash(B) = 9 B and A C and B

hash(C) = 17 collide at 9 collide at 17

C A B

[1] [5] [9] [17]

Collision

Resolution

13

Basic Concepts

Searh for B

hash(A) = 9

hash(B) = 9

hash(C) = 17

C A B

[1] [5] [9] [17]

Probing

14

Hash Functions

• Direct hashing

• Modulo division

• Digit extraction

• Mid-square

• Folding

• Rotation

• Pseudo-random

15

Direct Hashing

• The address is the key itself:

hash(Key) = Key

16

Direct Hashing

• Advantage: there is no collision.

size) is as large as the key space

17

Modulo Division

number

• Example:

Numbering system to handle 1,000,000 employees

Data space to store up to 300 employees

18

Digit Extraction

Address = selected digits from Key

• Example:

379452 → 394

121267 → 112

378845 → 388

160252 → 102

045128 → 051

19

Mid-square

• Example:

9452 * 9452 = 89340304 → 3403

20

Mid-square

• Disadvantage: the size of the Key2 is too

large

379452: 379 * 379 = 143641 → 364

121267: 121 * 121 = 014641 → 464

045128: 045 * 045 = 002025 → 202

21

Folding

• The key is divided into parts whose size

matches the address size

Key = 123|456|789

fold shift

123 + 456 + 789 = 1368

⇒ 368

22

Folding

• The key is divided into parts whose size

matches the address size

Key = 123|456|789

123 + 456 + 789 = 1368 321 + 456 + 987 = 1764

⇒ 368 ⇒ 764

23

Rotation

• Hashing keys that are identical except for

the last character may create synonyms.

original key rotated key

600101 160010

600102 260010

600103 360010

600104 460010

600105 560010

24

Rotation

• Used in combination with fold shift

600101 → 62 160010 → 26

600102 → 63 260010 → 36

600103 → 64 360010 → 46

600104 → 65 460010 → 56

600105 → 66 560010 → 66

address space

25

Pseudorandom

Key Address

Number Generator m Division

Number

y = ax +

c

For maximum efficiency, a and c should be prime

numbers

26

Pseudorandom

• Example:

= (2061539 + 7) MOD 307 + 1

= 2061546 MOD 307 + 1

= 41 + 1

= 42

27

Collision Resolution

• Except for the direct hashing, none of the

others are one-to-one mapping

⇒ Requiring collision resolution methods

used independently with each hash function

28

Collision Resolution

• A rule of thumb: a hashed list should not be

allowed to become more than 75% full.

Load factor:

α = (k/n) x 100

n = list size

k = number of filled elements

29

Collision Resolution

• As data are added and collisions are

resolved, hashing tends to cause data to

group within the list

⇒ Clustering: data are unevenly distributed across

the list

number of probes to locate an element

⇒ Minimize clustering

30

Collision Resolution

• Primary clustering: data become clustered

around a home address.

A B C D E

[1] [9] [10] [11] [12] [13]

31

Collision Resolution

• Secondary clustering: data become

grouped along a collision path throughout a

list.

A B D E C F

[1] [9] [10] [11] [12] [13] [14] [23]

32

Open Addressing

• When a collision occurs, an unoccupied element is

searched for placing the new element in.

Linear probe

Quadratic probe

Double hashing

Key offset

33

Linear Probe

• When a home address is occupied, go to the next

address (the current address + 1).

...

34

Linear Probe

001 Mary Dodd (379452)

002 Sarah Trapp (070918)

003 Bryan Devaux (121267)

002

Harry Eagle Hash

166702

008 John Carver (378845)

307 Shouli Feldman (045128)

35

Linear Probe

001 Mary Dodd (379452)

002 Sarah Trapp (070918)

003 Bryan Devaux (121267)

004 Harry Eagle (166702)

002

Harry Eagle Hash

166702

008 John Carver (378845)

307 Shouli Feldman (045128)

36

Linear Probe

• Advantages:

quite simple to implement

data tend to remain near their home address

• Disadvantages:

produces primary clustering

37

Quadratic Probe

• The address increment is the collision probe

number squared (12, 22, 32, ...).

address list.

38

Quadratic Probe

• Disadvantage: time required to square the probe

number.

(n + 1)2 = n2 + 2n + 1

Increment

Factor

39

Quadratic Probe

Number Location Increment Address Factor Increment

1 1 12 = 1 02 1 1

2 2 22 = 4 06 3 4

3 6 32 = 9 15 5 9

4 15 42 = 16 31 7 16

5 31 52 = 25 56 9 25

6 56 62 = 36 92 11 36

7 92 72 = 49 41 13 49

40

Quadratic Probe

• Limitation: it is not possible to generate a new

address for every element in the list.

(more elements reachable)

41

Pseudorandom Probe

address from the collision address.

42

Pseudorandom Probe

probes as well): there is only one collision path for

all keys.

⇒ produces secondary clustering

43

Secondary Clustering

hash(D) = 11

A C B D

[1] [9] [11] [13] [15]

Linear probes

Search/find location for D

44

Key Offset

collision address and the key.

45

Key Offset

Address

166702 2 543 239 169

572556 2 1,865 026 050

067234 2 219 222 135

address = (543 + 02) MOD 307 + 1 = 239

46

Linked List Resolution

• Major disadvantage of Open Addressing: each

collision resolution increases the probability for

future collisions.

⇒ use linked lists to store synonyms

47

Linked List Resolution

001 Mary Dodd (379452)

002 Sarah Trapp (070918) Harry Eagle (166702)

003 Bryan Devaux (121267)

Chris Walljasper (572556)

overflow

008 John Carver (378845)

area

prime

area

306 Tuan Ngo (160252)

307 Shouli Feldman (045128)

48

Bucket Hashing

• Hashing data to buckets that can hold multiple

pieces of data.

postponed until the bucket is full.

49

Bucket Hashing

Mary Dodd (379452)

001

002 Harry Eagle (166702)

Ann Georgis (367173)

Bryan Devaux (121267)

003

linear

Chris Walljasper(572556)

probe

307

50

- Collision Resolution docUploaded byawadhesh.kumar
- HashingUploaded bySerge
- Internet Banking SystemUploaded bysyam praveen
- Online Banking SystemUploaded byifeanyi_osi
- HashingUploaded byMohammad Gulam Ahamad
- A Hash Based Frequent Itemset Mining Using RehashingUploaded byEditor IJRITCC
- Cuckoo HashingUploaded by.xml
- 10-34Hash+35AppsUploaded byEmil Devantie Brockdorff
- test2fall00Uploaded byGobara Dhan
- ABBT06Uploaded byAnonymous RrGVQj
- Choosing Best HasingUploaded bySG
- CS102-06 Random AccessUploaded byRonki Ronquillo
- lec7Uploaded byGustavo Sla
- ADANOTES.docxUploaded bychandrakala
- HandoutUploaded byPriyank Sharma
- 34 Hash TablesUploaded byKanahaiya Gupta
- matlab_-_slides_-_lecture_5_+_solUploaded byKtk Zad
- Chapter 8- HashingUploaded byAbby Jennings
- ch19Uploaded bySuyog Nigudkar
- 109search Hash Malik Ch09Uploaded bysachinsr099
- Hash TableUploaded byAyush Bhatia
- Hashing.4up 2Uploaded byAiswaryaUnnikrishnan
- Hashes DictionariesUploaded byrajnirajpal
- 10 Tablas HashUploaded byMiguel Ramirez
- Balancing Techniques in OrleansUploaded byqweqweqweqweqweqwe
- Open HashingUploaded byMohamMad Reza
- HashingUploaded byJohn Erol Evangelista
- L09-HashOpenUploaded byVidya Sagar S Danappanavar
- 12-3Uploaded byNitin Sharma
- HashUploaded byKeerthana

- GrammarParserUploaded bymrbkiter
- FuncProgUploaded bymrbkiter
- ADT_OOPUploaded bymrbkiter
- Control StructuresUploaded bymrbkiter
- Chapter 1Uploaded bymrbkiter
- IntroductionUploaded bymrbkiter
- LexUploaded bymrbkiter
- Approximation AlgorithmsUploaded bymrbkiter
- SortingUploaded bymrbkiter
- Search TreeUploaded bymrbkiter
- RecursionUploaded bymrbkiter
- Multiway TreeUploaded bymrbkiter
- Linked ListUploaded bymrbkiter
- Heap SortUploaded bymrbkiter
- GraphUploaded bymrbkiter
- Binary TreeUploaded bymrbkiter
- IntroductionUploaded bymrbkiter
- Chapter 8Uploaded bymrbkiter
- Chapter 7Uploaded bymrbkiter
- Chapter 6Uploaded bymrbkiter
- Chapter 5Uploaded byPritha Gupta
- Chapter 4Uploaded bymrbkiter
- Chapter 3Uploaded bymrbkiter
- Chapter 2Uploaded bymrbkiter

- SAEP-389Uploaded byAnonymous 4IpmN7On
- Partial differential equationsUploaded byYunelsy Nápoles Alvarez
- Maths For ScienceUploaded byArsalan Ahmed Usmani
- 12 Maths Key Notes Ch 04 DeterminantUploaded byYazhisai Selvi
- PLATO Courses MapUploaded byjetlake11
- Maths Guidelines for Board Exams.pdfUploaded byheena
- 2012 10 30 ReInventing the Wheel Part1 R 5Uploaded byNiku Marian
- 4 Unit III Statistical TestsUploaded byashu
- Acids and Bases Calculations NotesUploaded byAnonymous RAbkanq
- mit quiz solutions 2Uploaded byjethros1
- Programing manual STAR TSP700_800.pdfUploaded byKlema Hanis
- Casio-TE100.pdfUploaded byAliang
- Breakthrough in Conformal Mapping_JUploaded byDante Freidzon
- Risk Assessment of Ship Platform ColissionUploaded by123habib123fikri
- Autocad 2013 TutorialUploaded byLechovolea Victor Catalin
- Lab01 Intake ManifoldUploaded byMatteo Torino
- Neural Nets and Deep LearningUploaded byManu Carbonell
- FEDNY Interest Rates and the Market for New Light VehiclesUploaded bysdinu
- Formulation of Linear Programming Model - UpdatedUploaded byArslan Hamid
- Topic 3 Oscilloscope and Signal GeneratorUploaded byChin
- El concepto de equilibrio en diferentes tradiciones económicasUploaded byelias
- Gat Formula ListUploaded byhasanejaz2001
- Interpolation of ContrastUploaded byAnonymous jdIpkGzehi
- open channelUploaded byUnpredictable PriNce AnkitSharma
- Lateral Stability DerivativesUploaded byCodenameairi Ally
- gui_helpUploaded byart101988
- BeerVectorDynamicsISM_Ch11_v2Uploaded byJoe Shmo
- ANSYS CFX Reference GuideUploaded byBhaskar Nandi
- The Non-Local Universe the New Physics and Matters of the MindUploaded byTony Al
- CommandsUploaded byghostlat