hashing cg

hashing cg

IT Interview Questions
completejava j2ee
hashing & indexing
08 DS and Algorithm Session 11
Java Border
CompleteJava&J2EE[1]Vv
Complete Java and J2EE By VENKI
Hash Code
c++ ;classes for objects that contain
TERATOM -- Chapter 3 Hashing of the Primary Index
hashing
ch3
BHUSA09 Oh DiffingBinaries SLIDES
Lecture 9
Hashes Dictionaries
Data Structures & Algorithms- Final Part 1 Answers
Internal Tables Basic
Chapter 7
Java Material
Java Material-Interview Purpose

Algorithms

1

Trees

Graphs

Hashing

File Organization

2

UNIT 3 HASHING

Support very fast retrieval via a key

Contents

3

1. Hash Table

◻ Hash function, Bucket, Collision, Probe

◻ Synonym, Overflow, Open hashing, Closed hashing

◻ Perfect hash function, Load density, Full table, Load factor, rehashing

2. Issues in hashing

◻ Hash functions- properties of good hash function

◻ Division, Multiplication, Extraction, Mid-square, Folding and

universal, Collision

3. Collision resolution strategies-

◻ Open addressing and chaining

4. Hash table overflow - extended hashing

5. Dictionary- Dictionary as ADT, ordered dictionaries

6. Skip List- representation, searching and operations- insertion,

removal.

Searching - most frequent and prolonged tasks

Searching for a particular data record from a large amount of

data.

Consider the problem of searching an array for a given value.

If the array is not sorted, the search requires O(n) time

If the value ISN’T there, we need to search all n elements

If the value IS there, we search n/2 elements on average

If the array is sorted, we can do a binary search

A binary search requires O(log n) time

About equally fast whether the element is found or not

More better performance ?

How about an O(1), that is, constant time search?

We can do it if the array is organized in a particular way

4

Search performance

5

searches.

From linear search to binary search, the search

efficiency improved from O(n) to O(log n) .

Another data structure, called a hash table, which

helps to increase the search efficiency to O(1), or

some constant time.

HASHING - is a method of directly computing the

address of the record through key by using a suitable

mathematical function called the hash function.

Hash Table – Data structure for hashing

6

information> pairs.

It is a data structure that stores elements and allows

insertions, lookups, and deletions in O(1) time.

Is an alternative method for dictionary representation.

A hash function is used to map keys into their positions in

the table – Hashing.

Hash table operations:

Search – Compute hash function f(k) & CHECK if a pair exists.

Insert – Compute function f(k) & PLACE it in appropriate position.

Delete – Compute function f(k) & DELETE the pair in that position.

In an ideal scenario, hash table search/insert/delete takes θ(1).

Hash Table = Array + Hash function

7

an array (the actual table where the data to be searched is

stored) and

a mapping function, known as a hash function.

integer space that defines the indices of the array.

Hashing

8

The hash function provides a way for assigning numbers to the input

such that the data can be stored at the array index corresponding to

the assigned number.

Hashing is similar to indexing as it involves associating a key with a

relative record address.

With hashing the address generated appears to be random —

No obvious connection between the key and the location of the

corresponding record.

Sometimes referred to as randomizing.

With hashing, two different keys may be transformed to the same

address

Two records may be sent to the same place in a file – Collision

Two or more records that result in the same address are known as

Synonyms.

Hash Function

9

that converts a numerical input value into

another compressed numerical value.

length but output is always of fixed length.

called message digest or simply hash values.

Hash function

Hashing - Example

10

start with a hash table array of strings

(Strings are used as the data being

stored and searched).

B

U

C

Hash table size is 12 K

Hash table is an array [0 to Max − 1] E

T

S

Hashing - Hash function

11

There are many possible ways to construct a hash function.

Let’s take a simple hash function that takes a string as input. The

returned hash value will be the sum of the ASCII characters that make

up the string mod the size of the table:

Hash

String ∑ASCII characters % table_size

Value

{

int sum = 0;

for( ; *str; str++) sum += *str; //sum of all characters

return sum % table_size;

}

Example

12

"Steve".

We run "Steve" through the hash

function, and find that

hash("Steve",12) yields 3:

S:83 t:116 e:101 v:118

83+116+101+118+101 = 519

519 % 12 = 3

3

of Steve

Example

13

“Spark".

We run “Spark" through the hash

function, and find that

hash(“Spark",12) yields 6:

6

of Spark

Key Terms used in Hashing

14

Key

Definition

Term

Hash Hash table is an array [0 to Max − 1]

Table of size Max

For better performance – keep table

size as prime number.

Hash A hash function is a mathematical

Function function that maps an input value into

an index / address.

(i.e. transforms a key into an address)

Bucket A bucket is an index position in a hash

table that can store more than one

record.

When the same index is mapped with two keys, both the records are stored

in the same bucket - This is called as collision for bucket size 1.

Alternative – Buckets with multiples sizes.

15

Key Terms

16

calculation and check for success

is called as a probe.

Running “Spark" through the hash

function, and finding an index 6

is a probe.

6

of Spark

Key Terms

17

hashing into the same address is

called collision.

With bucket size =1

25 Key % Table_size

5

25 % 10

55 Key % Table_size

5

55 % 10

COLLISION

Key Terms

18

address are called synonyms.

For e.g. “25” and “55” are synonyms.

“Alka” and “Abhay” are synonyms.

Key Terms

19

Many keys hashing to a single

address and

Lack of room in the bucket is known

as an overflow.

when the bucket is of size 1.

Key Terms

20

potentially unlimited space, it is called as open or external hashing.

How to handle bucket with size 1 for unlimited space?

Each bucket in the hash table is the head of a linked list.

All elements that hash to a particular bucket are placed on that bucket’s

linked list.

Collisions

Key % 10 are stored

outside the

table.

Application - Open / External Hashing

21

The target address space is made of buckets

Each of which holds multiple files.

A bucket is either one disk block or a cluster of contiguous disk

blocks.

A reference (index) about the

file and directory on the

System.

LINUX

Key Terms

22

eventually limiting the number of records to be stored, it is called as

closed or internal hashing.

records ?

Collisions result in

storing one of the

records at another slot

in the table.

Limits the table size.

Key Terms used in Hashing

23

Perfect The hash function that transforms different keys into different

Hash addresses with NO Collisions is called a perfect hash function.

Function The worth of a hash function depends on how well it avoids

collision.

Load The maximum storage capacity, i.e. the maximum number of

density records that can be accommodated, is called as loading

density.

Full Table All locations in the table are occupied.

(Based on the characteristics of hash function; a hash function

should not allow the table to get filled in more than 75%) – To

handle collisions.

Key Terms

24

records stored in a table divided by

the maximum capacity of the table.

Expressed in terms of percentage.

Key Terms

25

When we try to store the record with Key1 at the bucket position

Hash(Key1) and find that it already holds a record, it is collision

situation.

We can use a new hash function or the same hash function to

place the record with Key1.

OR

If the table gets full, then build another table that is about

twice as big with an associated NEW hash function.

The original table is scanned, and the elements are re-

inserted into the new table with new hash function.

Key Terms

26

Key Terms

27

Consider table size as 7

Hash function Key % 7 • NEW Table size

Elements - 13, 15, 24, 14, 23, 19 17 (7*2=14 &

next prime is 17)

14 • New hash

If 19 is inserted; function = key %

table will be 85% 17

full & will affect Re-hashing

the search • Old table is

performance. scanned and all

the elements are

inserted into new

After inserting 13, 15, 24, 14, 23 table.

Issues in Hashing

28

collisions.

store or locate synonyms.

Features of a good hash function

29

Addresses generated from the key are uniformly and randomly

distributed.

Small variations in the value of the key will cause large variations in

the record addresses to distribute records (with similar keys) evenly.

The hashing function must minimize the occurrence of collision.

The hash function should generate different hash values for similar

strings.

The resultant index must be within the table index range.

