57 views

Uploaded by RAYUDU8

- Optical Wireless Communication for Auv
- Datastage ETL Standards-Hcl
- 4690 POS Keyed Files Characteristics and Limitations
- Test_Case_1.4
- SQL Queries for Practice
- [New Gpes]User Manual En
- Hashing Best Study
- Cassandra Succinctly
- “You steal music I lock your PC” Virus – Unblock Computer Instruction
- 58116356 Standard UI Test Cases
- Temporal Hashing
- Us 8549004
- 2015-03-16
- YT1-1627-000 H.264Data Protocol Tutorial E
- Online Reorg
- Help With Data Array for Hash Table Creation (Java in General Forum at JavaRanch)
- Art Project
- Main
- Java Collections Interview Questions
- 10 Hashing

You are on page 1of 12

function on the search key value.

Hash indexes are best for equality searches. Cannot support range queries.

Hash function [h(key)]: Maps the key to a bucket (κάδο) where the key is

expected to belong.

and at random - similar hash keys should be hashed to very different

buckets.

• Low Cost. Plain hash functions (rather than cryptographic hash functions

such as

MD5,SHA1) usually have a low computational cost.

• Determinism: for a given input value it always generates same hashvalue.

We shall utilize a Trivial Ηash Function i.e., the data itself (interpreted as an

integer in binary notation). E.g., 4410 = 1011002

power of two (i.e., N=2d, dεN), then these are the d least significant bits.

Static Hashing:

• Data entries are stored on a number of successive primary pages

1

• Primary pages are fixed, allocated sequentially during index construction.

Overflow pages are utilized when primary pages set full.

• Primary pages are never de-allocated during deletions.

• This is similar to the way of ISAM File organization.

• Insert/Delete: 2 I/Os (read and write) page.

• Drawback: Long overflow chains can develop and degrade performance.

overflow pages if the file does not grow too much.

2. Rehashing: Hash the file with a different hash function (see next slide)

to achieve 80% occupancy and no overflows.

Drawback: Takes time (we need to rehash the complete DB)!

3. Dynamic Hashing: Allow the hash function to be modified dynamically

to accommodate the growth/shrink of the database (i.e., essentially

rehash selected, rather than all, items)

• Extendible Hashing

• Linear Hashing

Extendible Hashing:

example:

doubling the number of buckets?

2

• Answer for this is – the entire file has to be read and written back to

disk twice to achieve the organization which is expensive.

directory instead of doubling the data file.

• Directory is much smaller than file, so doubling the directory is much

cheaper.

• Split just the bucket that overflowed not all of them i.e., only one page

of data entries is split and no overflow pages are constructed.

3

Example:

Extendible hashing – SEARCH:

Locate data entry r with has value h(r ) =5 ( binary value 101). Look at

directory element 01 ie Global depth least significant bits of h(r ). We

then follow the pointer to the data page (bucket B in the below figure).

Algorithm:

• Find target buffer: Done similarly to search.

• If target bucket is not full, insert and finish (e.g., insert h( r) =9 which

is binary is 1001 can be inserted to bucket B)

• If target bucket is full, split it (allocate new page and re-distribute).

Insertion of h(r ) =20 (10100) causes the split of bucket A and

redistribution between A and A2.

4

• Inserting h( r) =20 causes directory doubling.

• When target bucket is full and local depth is equal to global depth then

the bucket split causes directory doubling.

• Notice that after doubling some pointer are redundant and these are

utilized in subsequent inserts.

5

Global depth of directory: Tells us how many least significant bits to

utilized during the selection of the target bucket.

o Initially equal to log2(#Buckets), e.g., log28=3

o Directory Doubles => Increment Global Depth

utilized to determine if any entry belongs to a given bucket.

o Bucket is split Increment local depth

are continuously splitted leaving in a way that the local depth of other

nodes is same while global depth increases).

• If removal of data entry makes bucket empty then merge with `split

image’ (e.g., delete 32,16, then merge with A2).

• If each directory element points to same bucket as its split image, can

halve directory (although not necessary for correctness)

6

• Equality Search Cost: If directory fits in memory then answered with 1

disk access or else 2.

• Static Hashing on the other hand performs equlity searches with 1 I/O

(assuming no collisions)

• Yet, the extendible hashing directory can usually easily fits in main

memory.

• Directory can grow large if the distribution of hash value is skewed

( some buckets are utilized by many keys, while others remain empty).

• Multiple entries with same hash value(collisions) cause problem… as

splitting will not redistribute equally the keys.

Linear Hashing:-

• Linear Hashing avoids directory by splitting buckets in round robin

fashion and uses over flow pages.

• Over flow pages are not to be long. Space utilization could be lower

than EH, since splits are not concentrated on dense data areas.

• Can tune criterion for triggering splits.

• LH handles the problem of long overflow chains (presented in Static

Hashing) without using a directory

• Idea: Use a family of hash functions h0, h1, h2, ... where each hash

function maps the elements to twice the range of its predecessor, i.e.,

7

• if hi(r) maps a data entry r into M buckets, then hi+1(r) maps a

data entry into one of 2M buckets. Hash functions are like below:

hi(key) = h(key) mod(2iN), i=0,1,2… and N=“initial-#-of-

buckets”

• We proceed in rounds of splits: During round Level only hLevel(r) and

hLevel+1(r) are in use.

• The buckets in the file are split (every time we have an overflow), one-

by-one from the first to the last bucket, thereby doubling the number

of buckets.

INSERTION Algorithm:

• Find target buffer (similarly to search with hLevel(r) and hLevel+1(r) )

• If target bucket is NOT full, insert and finish (e.g., insert h(r)=9,

which is binary 1001, can be inserted to bucket B).

• If target bucket is full:

o Add overflow page and insert data entry. (e.g., by inserting

h(r)=43 (101011) causes the split of bucket A and redistribution

between A and A2

o Split Next bucket and increment Next (can be performed in batch

mode)

Note that 32(100000), 44(101100), 36(100100)

8

• The buckets in the file are split (every time we have an overflow), one-

by-one from the first to the last bucket ΝR (using Next index), thereby

doubling the number of buckets.

• Since buckets are split round-robin, long overflow chains don’t develop

(like static hashing) as eventually every bucket is split!

• LH can choose any criterion to `trigger’split :

– e.g., Split whenever an overflow page is added.

– e.g., Split whenever the index is 75% full.

– Many other heuristics could be utilized.

Increase Level after Insert:

• If Next = NR (after overflow) then level is increased by 1 (thus h2, h1

will be utilized) and Next becomes 0

• Below the addition of 50* (110010)causes Next to become equal thus

one the Level is increased by one.

Overview of LH File:

Assume that we are in the middle of an execution, then the Linear Hash file

has the following structure.

9

SEARCHING:

Search:

To find bucket for data entry r find hLevel(r):

Split Bucket: If hLevel(r) maps to bucket smaller than Νext (i.e., a was split

previously r could belong to bucket that previously, then bucket hLevel(r) or

bucket hLevel(r) + NR; must apply hLevel+1(r) to find out.

10

LH described as a variant of EH:

• Begin with an Extendible Hashing index where directory has N

elements.

• Use overflow pages (rather than splitting the overflowed page as this

happens in EH) and split buckets in round-robin (similar to linear

hashing).

• First split is at bucket 0. (Imagine directory being doubled at this

point.)

• But elements 1==N+1, 2==N+2 (recall the EH redundant pointers)

• So, need only create directory element N, which differs from 0 for now.

• When bucket 1 splits, create directory element N+1, etc.

• Also, primary bucket pages are created in order (consequently correct

page can be located with a simple offset calculation). Thus, we actually

don’t need a directory.

• Vise-versa that is the same with LH.

11

•

12

- Optical Wireless Communication for AuvUploaded byNirmal Thomas
- Datastage ETL Standards-HclUploaded byVishal Choksi
- 4690 POS Keyed Files Characteristics and LimitationsUploaded byjssanche
- Test_Case_1.4Uploaded byAbhishek Soni
- SQL Queries for PracticeUploaded byrboy1993
- [New Gpes]User Manual EnUploaded byHenry Huayhua
- Hashing Best StudyUploaded byShanmu Technocrat
- Cassandra SuccinctlyUploaded byWilliamette Core
- “You steal music I lock your PC” Virus – Unblock Computer InstructionUploaded byCraig Hill
- 58116356 Standard UI Test CasesUploaded byRajanibtechcs Csi
- Temporal HashingUploaded byEdisonChimuelo
- Us 8549004Uploaded bydavid19775891
- 2015-03-16Uploaded byDanang Di Kaligawe
- Online ReorgUploaded byKaushik Gupta
- Help With Data Array for Hash Table Creation (Java in General Forum at JavaRanch)Uploaded bypavani21
- MainUploaded byAnkit Garg
- YT1-1627-000 H.264Data Protocol Tutorial EUploaded bybalikci
- Art ProjectUploaded byBrij Anand
- Java Collections Interview QuestionsUploaded byKedar Nath
- 10 HashingUploaded byAshley Jain
- homework2TanvirUploaded byshoummow
- 00fcsUploaded bysfofoby
- JAVA MaterialUploaded byHarshitaAbhinavSrivastava
- 10.1.1.3Uploaded bycpkcool
- Virtual Memory LeeUploaded byJaspreet Kaur
- serviceconsoletrainingUploaded byapi-308152776
- OUTPUT.docUploaded byLüv Kumar
- uroboros.txtUploaded byManuel Collier
- Pertemuan 5 6 dan 7Uploaded bynabila z
- LTE Tracking Area UpdateUploaded byArif Maldini

- Kuliah 11 Optimization and Linear ProgrammingUploaded byJayanti Tangdiombo
- Cazacu Razvan(Mechanics)-Overview of Structural Topology Optimization Methods for Plane and Solid StructuresUploaded byArdeshir Bahreininejad
- n10-mincostflowUploaded byttungl
- icfp01Uploaded byAnonymous RrGVQj
- Music Through Fourier SpaceUploaded byFederico Larrosa
- MTE-06Uploaded byVipin
- Double Pendulum engineeringUploaded byvillanuevamarkd
- aho-3.7Uploaded byIovanny
- 5-8 Rational Zero Theorem[1]Uploaded byEnas Basheer
- Solutions KemperUploaded byMallu Oliveira
- Root Locus and matlab programmingUploaded byshaista005
- Linear Algebra (SM), KuttlerUploaded bycantor2000
- Skema Matematik k1 & k2 Mibf Spm 2012 Set1Uploaded byWan Zainul Akmar
- Algebra 2 Honors PBAUploaded byJulie
- En1 Periodic Signals Fourier Series2016Uploaded byAna-Maria Raluca Chiru
- thesisFINAL - Dalia Almaatani.pdfUploaded byYukitoAkashina
- 0 Optimization of Chemical Processes.pdfUploaded byAugusto De La Cruz Camayo
- blasiusUploaded byNaris Vongtyanuvat
- Foc LaplaceUploaded bymantascita
- Shortcut in Solving Linear EquationsUploaded bybala1307
- Discrete MathUploaded byMohsin Yasin
- NMF in RUploaded byHAlperTayalı
- Lagrange MultipliersUploaded byAshish Zachariah
- History m.algebraUploaded bySyahira Yusof
- Generalized Hankel-Clifford transformation for a class of tempered ultradistributionsUploaded byAnonymous vQrJlEN
- 253 book 1Uploaded byAnonymous s4yhkk
- Applied Mathematics Mam1044H sunflower and fractal projectUploaded byChloe Sole
- 6 BC.pdfUploaded bySaher
- Elasticity Mohammed Ameen.PDFUploaded bySaleem Mohammed
- Data Structures and AlgorithmsUploaded byDavid Mihai