Вы находитесь на странице: 1из 12

lnrlcxing

rtrrcl I lashing

lndcx
A datat-rasc incicx is n clata s1r'uclurc that inrprovcs tltc spccd ol'data rctrieval opclations on
databersc tablc at thc cost ol'slowcr writes ancl thc use of'more storage space.

lt

llasic Concepts
only a srlall proportion of thc rccorcls in a ljlc. Ior cxanrplc, a clLicrv
all accounts at thc Pcrryriclgc bruulch" or "Find the balance o1'account nrrurbcr;\l()1" rclbrcnccs only u fl'lction ol'the accr'rlu-rt rccorcls. It is iucl'l'lcicnt tbr thc s1'stcil to rcatl
cvcr\/ r'ccol'cl lrnrl io cltccli l,lrc hrttttclt-uutna llclcl lirr llic nr.rntc "l)cn1,1'l1ga," or tltc (iL'(()tt)t'tttrntber llcld lbr thc value A-101. lclcally. tltc syslenr should be ablc to iocatc thcsc rcc()r'ds
clircctlr,. 'l'o allow thcsc lornrs 01'ac:ccss, we design adclitional structurcs that u'c .rssocirtc
u'ith filcs^
N4auv clucrics rclcrcncc

lil<c "Fincl

An inclcx lbr a lllc in a databasc systcu worlis in muc,h thc sarnc way as thc irrdcx in this
tcxtbooh. ll-lvc r.vant to lcarn about a 1-rarticular topic, wc cern scarch Ibr thc topic in thc inclcx
at 1hc back of tirc book. llnd thc pagcs rvhcre it occurs, and then read the pagcs to find lhc
inlbrrnation we arc looking lbr. The words in the indcx are in sorted ordcr, making ii casv lo
llncl thc r,vorcl w'c arc loolting {ilr. N4orcovcr. i}rc inclcx is uruch smaller ihan thc br-rok. lurtirur'
lcclucing thc cllbrt necclccl to lhcl thc ivorcls we arc looking lbr.
l)atabase systcrr indiccs play the saure roie as boolt indices or card catalogs in libraries. I;or
cxirrrilllc. to rctricvc Lu1 (tccorutl rccorcl givcn thc account numbcr, thc databasc s),slcirl rvoi,rlrl
l,'.'k u:'r.ur ittJcr io linil e'n nirich ciisli block thc corrcsponding rccord resiclcs. and llrcn lcteh
tirc .iisk biock. to gct Lltc uccottttl rccold.
Kccping a soltccl list ol-accoLtttt numbcrs woulcl not worl< wcll on vcry lalgc databa:;cs vuith
nrillions ol'uccounts, sitrcc 1hc inclcx rvoulcl itsclf bc vcry big; Iirrthcr cvcn thoLigh l<ccprng
thc inclcx .sortcd lccluccs thc scurch tinrc, Ihcling an accoLrnt can still bc rathcr tintcconsunriug. lnstcacl. nrorc sopiristicaicd indcxing techniclucs rnay bc used.

'l'vJrcs ol' I ndice s


'l'lrcrc arc two basic kinds of indices:

.
.

Ordcrctl Indiccs: Ilascd on a sortcd orclering of the values.


Ilash Incliccs. IJascd on a unilbmr dislribution of vallies across a rangc o1' bucl<cts.
'l'ire bucket
1o lvhich a value is assignecl is cietermincd by a fi"inction, cailccl a hu,th
Ittnc'l iott.

llvalu atio n I,'acto rs/Criteria


'l'hcrc
atc scvcral tcchniclr-rcs lbl both olcicrccl inclcxing and hashing.No ot-tc tccltniclrrc is ihc
I{ather.
cach tccl-uricluc is bcst suitecl to particr,rlar databasc applications. Earch tcchniciuc
bcst.
musl bc evaluatccl on thc basis ol'1hc lbllou,'ing lactors:
. Acccss t)'pcs: '['hc typcs o['acccss that are suprportccl clliciently. .A.cccss t,yl]cs ciu']
inch-rdc linding lccords with a specilicd attribute valuc and fir-rding rccords r.r'hosc
attribLrlc values fall in a spccilicd range.
. r\cccss timc: The timc it takcs to find a particular data itern, or set of itcnrs. Lrsing thc
tcclrniquc in qLrcstioti.

Inscrtion timc: 'l'lrc tirnc il lalics to iuscrl n ncrv dala itcrl.'l'his vrlrrc inclrr'.le: iir,'
tiurc it tulics to ljrtl thc corlcct place trl iuscrt tlrc ncw clata itcn-r, as wcll us tlre tirirc il
tal<es to uptlate 1hc inclcx strurcturc.

Dclction tinrc: 'l'hc tiurc it talics to clclctc a clata itcm.'l'his valuc inclLrdcs thc tirre rl
takcs to llnd thc itcm to be elelctcd. as well as the tirnc it takes to updatc thc incicx
stnrctuLc.

Spacc ovcrhcad: l'hc additional space occupied by an index structure. Providcd that
thc arnount cll- aclclitional spacc is rnodcratc, it is usuaily worthwhilc to saclilrcc thc
spauc tr,l achicvc intprovcd pcrlbrtttarrce .

Scarch l(cy'
An attribr-rtc or sct ol'attributcs uscd to look up rccords in

file is called

scarch licy.

Ortlcrcd Intliccs
'l'o gliin litst mnilttur access lo lccorrls in a lilc, wc can Lrsc aln inclcx structLrrc. ljacir inclcx
slnrctlrrc is associittccl rvith a particular scarch key. An ordered index storcs the valr.rcs o1'thc
scarch keys in solted ordcr, and associatcs with eacir scarch key tl-re rccords that contairr it. A
lilc may have several indices. on dilkrent scarch keys.

l'rilnarl' (clustcrcd) Iurlcx: i1' thc lrlc containing thc t'ccords is sc.luer'.i:r..'.
prin'rarv iirelcx is an inclcx rvhosc scalch kc1'also dcllncs'rhe scc'.:c:r:j.:l c::--:
l'ritrarv inclices arc also callcd clrrstcrius itidiccs.

t\\o tvpcs of ordcrcd indiccs iirilt rre can usc:


I)cnse lndex: r\n index record appears tbr evcry search-key value in thc trlc. in

l-hcrc arc

clcrrsc primary inclcx, tl-rc inclcx rccorcl contains thc scarch-l<cy valuc ancl u pointer tcr
thc llrst clata rccord with that scarch-l<cy valuc.'l-hc rcsl ol'thc rccorCs with tlrc sarll
scarch licy-r'aluc lvould bc storcd scqLrcntially al'tcr thc lllst lccorcl, sincc. bccrLrse llic

indcx is a primary onc. rccords arc sortcd on the samc scarch kcl'.

A-t0l

Blich ti)rr
f)ori'n torr,rl

500

A-110

f)ctlrrrt tot,r'rr

600

A- '15

N.'[iarti-rs

700

A-102

Perlyricl

ger

,100

A-201

Pelrvricl qe

c)00

A-218

Pen'vrirl

700

A-222

iledu,ooc{

A-305

l(otrncl

Briglr tt,rr

A-217

Dcrtr'rt Lolr'tt

Nli.trtrrs
Prrl r'\r ri

cl

.e

Rerl t.r'ooil

Rotrrrtl

Hill

-=

t\
\-

qer

Hill

754

l)

701)

350

Irig. l: Dcuse Index

Sparsc Intlcx: An inclcx recorcl appeal's lbr only sonte ol'thc scarch-kcy valLrcs. I:ach
index rccord contains a search-key value and a pointer 1o the first data rccord r.vith
that scarch-key value" To locate a record, we find the indcx entry with the largcst
scarch-key value that is less tl-ian or equal 1o the scarcli-key valuc ior which \vc arc

looking. We start at tlic t'ccord pointcd to by that indcx entry, and lbllow thc poinlcrs
in llrc fllc tnttil n,c lnttl thc tlcsil'ctl rccol'tl.
lSri.girtori

lJtighton
Vliantrs

I-)or,r'niLrlvn

Redt".rrood

L)o\v'nlo\,\'n
lVlianrLs

Pcrryrirlge
Pcrrvri.ige
Penyridge

A-222 | liercln'oor{
Iioum.{

I{ill

Irig. 2: Sparsc lncicx

Comparison:
It is generaily i'astcr 1o locate a rccord if we l-ravc a dense index rather than a sparsc incicx.
I lowcvcr, sparsc indiccs havc advantagcs over dcnse indiccs in that they rcqr-rire lcss spacc
and they impose less maintenance ovcrhcad lbr insertions ancl delctions.
'l-hclc is a lraclc-olf that tl-ie systcr-r-r dcsigner uust n-iakc bctwcen acccss timc and spacc
.',.:::,'*.:" .i.:i;.ri;:.;;.;,-'i.-ci-'io;r icgardin-u, this tradc-o11 depcnds ott thc sltccil-ic apllliclrttorl.
a good compromise is to have a sparse index with one index entry per block. The rcason this
dcsign is a good trade-off is tl"rat the dominant cost in processing a database reqllcst is thc
tinrc that it tahes to bring a block liom clisi< into main memory. Once rve ltavc brought in the
block, thc tirlc 1o scan thc cntirc block is ncgligible. Using tiiis sparsc indcx, r.r'c Iocatc thc
block coirtaining thc rccord that we arc sccking. fhus, unless the recorcl is on an ovcrllow
block, rve minimizc block accesscs r,vhilc kceping tl-re size of thc index as small as possiblc.

N{ulti-Lcvcl Indiccs
- If primary indcx does not l'rt in merlory, access bccot-ucs cxpctrsive.
- Solution: treert printary indcx kcpt on disk as a sequenlial file and constrlrct a spi-rrsc
indcx or-r il.
' outcr inclcx -- it sllarso irrdcx of prin-raly indcx
' inncr inclcx - thc prirlary indcx lilc
- ll evcn ollter inclex is too large to fit in main memory, yet auothel level ol'itlclcx catl
bc clcatcd. and so on.
Incliccs at nlI lcvcls lt.tust bc upc'latccl ott inscrtioti ttr dcletion liorn thc filc.

ittr.lcr'
Lrltrc"l.

(.1.i
I

1"1

l-)l()cl(

Irig. 3: Multilevel Index

An Exaltplc: Colsider 100,000 Lecords, 10 per block, at one index record per block.
tSat's 10.000 ipclcx lccolds. l:vcn ilwc can ilt 100 index records pcr bloclt- this rs 100
blocks. If indcx is too largc to bc kcpt in urain lllemor)-. a search rcsults in scicrli
disk rcatls.
a

t
a

l;or vcry largc lilcs, aciditional lcvcls of indcxitlg tnay bc recluircd.


Incliccs must be qrdateel at all lcvcls when insertions or deletions rcquire it.
Frcqucntly. cach lcvcl ol inclcx corrcspclttds 1o it unit ol'physical storagc.

Sccr-rpdarl'(nt1n-clustcrcd) Indcx: Incliccs rvhose scarch kcy spccifies an ordcr clil'fcrcnt


lionr t6c scqucntial orrleL o1'the lilc arc callcd sccondary indiccs, or noti-clustcring indiccs.
:rncl ri
Sccoutial.y incliccs grust lrc clcnsc rvith au inclex entry for every search-ke1' r'air-te'
poiltcr to cvcl.y rccorrl in thc lile . lf it sccondary indcx stores only some of thc scaruir-kc}
valucs. rccorcls witl, iptclnccliatc scarch*kcy valttes miry bc auyrvhcrc in thc lllc anci' irl
gcn,:ral, we cannot llnd thern without scarching the cntire 1lle.

lrig. 3: scconclary indcx on uccout'tl tlle , on non-candidate key balunce

ir-rdcx' cxccirt 1'41 t'c


rooks ir-rst rirtc a crcnsc primary
A sccondar.y i'cicx on zr ca'clicratc kcy
siorcd seclucntiany. rn gcrcral'
thc incrsx arc
in
valucs
succcssi'c.
by
to
'ot
rcc.r.cls poi'tcd
clitlc'cnt structltrc 1i'o'r prit'ary i.diccs'
sccond,ry indiccs utity llt'c a

howcvcr.

poi't t''itrst
a ca'cliclatc kcy' it is not crroug'to
n.t
is
i'clcx
scconcrary
or.u
l|ir. ,",-'nir-ri'g rccorcrs witrr the sarrre se'rch-kc1'
I.thc sc*r.crr kcy
uui.,".
kcy
rhc rrrst rccord witrr cacrr scar,:i-,-kcy
rccorcls arc orclcrcd by t'c scarch
thc
'1't'c
rirt,
'ortn"
tn.
.,ny*ir.,.lr,
rr.
valuc co*lcl
a sccondarv
'inct
inclex"fhercfbrc'
secotrdary
i'clex, rurit.l,ir* iy rhc r.u'"rtittf
to ali thc rccorcis'
iuCcx tnust colltaln poiutcrs
'rimary
Complrison (Prinrary vs' Sccondnry):
because records in the tlie
r A seclucntial ,1"1l i" primary'index orclcr is efficient

.
r

arc

orclcr as the iudex orcler'


storcd physically ir-r the satnc
of ,-lucrics tirat use licys tltircr thau thc
Sccondary incliccs ir-nprovc thc fcrtoullrnce
ovcrhcad ot-t
l-toweve', thcy ir-npose a.signit-rcant
ina".i.
primary
thc
oi
licy
scarch
dccidcs which secondriry
of thc clatabasc. irr.^a.rig"er-of a databasc
or qiicrics
oi.an*cstimarc of the rslalive frequcncy
'rodil-rcatio'
i'cliccs arc clcsirerblc o' rhc uori,
ar-icl niodi lications'
lllc'
the secluential orcler of the daia
itt. prinrary inclcx is on tl-re fieid whic'h speciiies
many sccondarY indices'
'l'hcrc cari be only one primary index wirilc therc can be

Q.l: \\'hcn is itpr.clcr.ablctOtlScl"iclct-tscit-rclcxralhcrtlranasl]ilrscirrdcx?Iixp1ltirrl,ot.tt.


ir.
detlsc itrdcx:
Ansrvcr: it is orcl'crablc to usc a
or
i. When ifr" itf. is not sortcd on the index toficld'
sizc of metnory'
ii. Whcn iit. i",f* hic is small cotnparcd the

ililS\\

searclr
why might thcy not be kcpt on sevcral
p.rocessitrg,
qllefy
specd
Q. 2: Sirrce irrdices
as possiblc'
l.cys? List as mally reasolls
scvcral scarch inciiccs includc:
inscrts ancl
Ansu'er: llcasons {br not kceping
and clisk i/o overhcacl cluring
i. livcry ir-rdex rcquircs aciditional?PU tin.,.

ii. iiillJitl;., ,o'-'ri'ra'y

*eys migrir ir;rvc to bc c'*rgccl

o' *pdarcs'
e'l1cicrrcv migrrt *o'| bc

il *:':,:::i:JTHi:l,THJ:il;H:ii:f:ili,T::*c,r,icvs,
on the'-r. 'r'heretbrc
oniy some or, rh;^l;;y, have i'diccs
baci cvcn

if

pcrlbrmnnc. i, i,nprouccl less

b;t;;ittg

darabasc

cxisi'
incliccs whcn many incliccs aircacly

on trrc sarnc rclatio'lbr dii'lc'cnt


gc'crar to havc trvo prinrary i'criccs
elnswcr'.. ,
.v i'dices on the sanre rclation l.r
scirt'ch licys? tixplain your
1o irave t\'vo prlmal"
possiblc
not
is
it
gcucral,
orclcr o'r c.lii'ic'crrt
Ansrvcr: ln
tuplcs are to bc storcd in dil'ferent
kcys. tlcca;;"-i" tl-,rrt or.r", thc

e.3: Is it possiblc i,
clil-lcrcnr
scarch l<cYs.

Ii t-1't." Intlcx llilcs


'l'ltc tttltitl''tlislttlvlttt{agc
ol' tltc itttlcx-scrlLrential Illc organizatiop is thlrt pctlsiiitilllcc
'
dcgradcs as the lile grorvs, both lbr index lookups ancl for sequcntial scans r.lrrouug

t
'
"

thc clata.'l't-l ovcr comc this clclicicncy, wc Llsc a rJ'-trcc indcx,

'fhc Il*-trcc indcx structure is thc rnost


widcly uscd of scvcral incicx sLrr,rclur.cs tha'r
ttraitltain thcir clllcicncy rlcspitc inscrtiou and clclctiol of clat1.
'l'his is a bahnccd trcc
in which cvsry path fiom the root ol the trcc to a lcal'ol'tfic
trcc is i:f thc serurc lcngth,
A IJ-t-trcc inclex is a multilcr.,cl indcx. A typical nodc o1'er B't--trcc is shorvq bcic,rv.

Irig. A typical node of a B*-tree

r
.
n

r
.
'

llach nclclc that is rrot a root or a lcral'iras bcrwcen [n/21and n chilclt.cn.


bctr.vccn l(n-_t tlZl ancl n*1 valucs
Spccial cascs:
. If 1hc root is not a lcaf, it has at lcast 2 children.
t 11'1he root is a ieaf (that is, there are no other nodes in Lhe rrcc). it cln l nyc
bctrvecn 0 and (n-1) valucs.
It contains up Lo tt - I scarch-kcy valucs Kt, Kz, . . .,Kr-t, and n pointcrs I't, [rz. .

A lcal'noclc lus

,P,,

'l'lrc

search-licys rn a node ale ordercd: Kr < Kz< Kt'-. . . < K,


l"or leaf nodes, lbr l: \,2, . , n - i, pointer Pi points to either a 1llc recor'd rvith
seat'ch-key valuc Ki or to a bucket of pointels, each of .which poinls to a lllc rccorcl
1

u,ith scar-ch-kcy valLrc 1(i.


Brigirton
ir',t

Don,rttorvn

i ttorlt'
Briglilcirr
A-101

Dotr,ttIoir,i',
Dotvrttorr'rr

Jqa,rLIlll tiir'

fig. A lealtrodc o['a I]-F{rco lor Lhe trccr.tunl filc (n-3)


arrd tlrc scarch l<ey is brunch-nante.

;
a
a

Sincc tlrc account lllc is ordcrcd by branch-nume, the pointcrs in thc lcaf noclc poirit
diLcctly to 1hc Iilc,
'l'hc noulcai
noclcs of the ll'l--tree l'orin amultilcvcl (sparsc) index on thc lcal'noel*;.
The structr.tre of nonieaf nodes is the same as tirat ibL leal' nocics, except lhat all
pointcls arc pointcrs to trcc nocles.
A ttonlcal'nodc tlay hold up to u pointcrs, and lnust itolci at least [ni2l pointcrs.
'l'hc nr.rmbcr o{'pointcls in a nodc
is callcd thc lanout ol'thc noclc.

'l'hc r()(it

11()clc

clttt lroltl lcrvcl'tlurrr In/?-l l"roitrlct's" lltlrvcvr"'r'

il

nrttst ltoltl

lt

lr"rtst

lrltr

poilltcrs.

lbr t'hc lbllowirlg scl trf licY valucs:


Q. .l: Consttuct a Ll-l--trcc
(2, 3, 5,7, 1 1, 17, 19,23, 29,31)
ancl values arc adclecl in ascending order'.Construct
Assuurc that the trcc is initially "*p,i
of pointcrs that will fit in onc rroclc is as Ibllows:
lJ t-trccs lbr thc cilscs whcrc thc nr',nib.,

i.
ii.
iii

Ansrve

i.

Four
Six
Digltt

r:
For order 4 (n=4)
19

\-1\

ii.

l:or ordcr 6 (n:6)

\t
7

11

29

rfl

\
19

LJ

29

JI

ll'ilcs

. ,< ,., ,r,r..* xri


l":\cr tlc ttr;.
incliees"l'h* pt'irni:Li" g{illliliglieiu hcl
storagc o1'scarch*ltey valLrcs'
,, ij-rr*. .f in-,i,t,,tt' tfic rcclundant
appr.oachcs i;;;
to
appear ouly once'. Thus' it is neccssar)'
to
vaiues
'l-hcsc
search-kcy
allows
B-tree
A
ibr .*l.r ,.or.i.' t..y ir a r,'rcal' n'clc'
irclr.rcrc a' aclcritioral pointcr lrcld
search key'
records or buckets for the associated
til.
.lt'.,
to
poi't
pointers
additionai

li-'l'rcc Intlcx

1]-ilce inelicr;s

$l* sitlliltlr

1o i]-i'-1rce

rAg.n.,oli,"dl]-troclcafnoclcatrclancln-lcafnoclcappcariirllig.aandllig.b
resPcctivelY'

trcc
In uonleaf nodes' the pointers P1 arc thc
[,eaf nocles arc the surmc as iu Bl--trees'
filc-rccold
or
*ftlit the pointers Bi aie buckct
poi'tcrs that q,c usccl also lbr B+ -trcc',
leaf node'
*
the figure' tliere arc n 1 keys in the
i'
g*,',.rJir*all-rr..
t5c
11
pointcrs,
lronlcal'
n,rdJ, This ciiscrcirancy occitrs-bccaruc
but therc atc u1- I hcys in thc nonlcaf
cirn bc
thitt
thc nr'r'rber of searclt ltcys
must incluclc pointcrs l);, tirus '"tit'ti'-tg
'ocles
hcld in these nodcs.

Atlr':tttt:tgcs of l]-'l'rcc indiccs:


r. n r .r
. Moy rtsc less trce noclcs tl-ran a cort'cspoudlllg lJ - I ree'
r Sor'crim., ;;;;ibl' ro ll'cl scarch-key value belore reachi'g

le

af node'

Disnclvantages of I]-'I'rcc indiccs


. Only small fraction of ail scarch-kc'v values are found early'
:

.Non.lcal.nodesarelarger,solan-otrtisledr,rced..flrus.B-Treestypicallyhar'egreater
flcc
clcpth thtlu corrcslloncling IJ'-

.
r

colllplicarccl tltarr irr 13'-'flccs.


lrr-rplcmentation is harclcr thau llo-'frccs'

lnscltion ilncl clclction

r110r.c

/
of licy valttes:
Q. 5: ConstrLlcl a lJ-trcc lbr the lbilowing sct
(2, 3, 5, 7, 11, 17, 19, 23, 29, 31)
ordcr. Cotrstruct
Assumc that thc tr.cc is initially cnipty ancl valucs arc addcci iu zrscencling
is
as lbllorvs:
nodc
rvill
onc
llt
in
I] t-trccs lbr t5c cascs whcLc thc nunibcr ol'pointcrs lltat
i. lrour
ii. Six

iii. Ilight
Artsu'cl':
i. For orcicr 4 (n:4)

l[:rshing
- O1e clisadvaltagc of scqucntial lile organization is that wc must usc ett.t it.tcicx
illlorv Lrs
structure to locaic data. File orgar-rizations based on the technique of hashing
to ar,oicl acccssipg ap incicx stmcttue. Ilashing also provides a way of constrLrctitig
:...|
IIIUiLLJ.-..
- irile organizations based on hashing allow us to find the address of a data item directly
by coniputing a fur-rctiol.l ou thc search-key value of the desircd record,
- Slncc'we doiot linow at design tirne prcciseiy r,vhich search-kcy values will bc storcd
i,' thc file, a good hash liinction to choose is onc that assigns scarch-kcy valucs to
buckcts such that thc distribrition is both uniform and random'
I Iash

lrilc Organization
In a hash file organization, we obtain the adcircss of the disk block" also calicd thc

buckct containing a desired rccorcl directly by computing a function on the scarch-kc-vvaluc ol'thc rccord.
Lct K clenore the set of all scarch-kcy values, and let B denote the set of all bucket
a hash lr-rnctiorr'
adclrcsscs. A hash function /z is a tlnction from I{to B. Let ft denote
'l'o inscrt a record with scarch kcy Ki, we colnpute h(Ki), which gives thc aclcircss ol'
to storc
thc br.rckct lbr that rccorcl. Assumc I'or now thal thcrc is spllcc in the bucket
titc rccord. I'hcn, thc rccot'tl is storcd irl that buckct'
'l'o pcrtbmr a lookup on a scaLch-kcy valuc Ki, wc sirrrply cottrputc h(Ki), thctl scarch
/(z' havc thc satttc
tirc buckct rvith thai acich'css. Suprposc lhat two scarolt kcys, Ks ernd
tiic bLrckct h(/(s)
:
hash valr,rc; that is, h(Ks) h(i<) If wc pcrform a lookup on Ks,
valLrcs Kr'
contains records with search-kcy valucs K5 and records with seaichkey
to Vcrily'
bucket
the
in
"l'6us, we iravc to chcck 1hc search-key value of every iecord
that thc rccord is cltlc that wc u':rut'
to bc clclctccl
Dclctiop is ccprally slraiglrtibnvalci. lf thc scarch-kcy value o1'thc rccord
rccorcl, tttrcl
that
fbr
is Kl, we colnpute h(l(i), tircn searQh tire corresportding btrckct

dclcle thc rccord li'om tire bucket'

Ilxarnplc of hash lilcs orgitnizatiolt


- l.ct us ch.osc]i hlsh lirnction iirr tlic accoi.rnt tilc Lrsing thc scarch kcy branch--llalll(:'
ni]l'n(rs
- SUpposc \vc havc 2(r bucltcts anci wc clclinc a hash ftlnction that rtlaps
the ithbucliet'
beginning withthc ith lctrcr o1'thc alphabet 1o
.l.his
it litils to piovidc a Lrnilirml
hash liructiou has t5c virtr.rc' of simplicily, but
to bcgin rvith strch lcttcrs rls 1l ancl It
distr.ibr-rti.', si'cc wc cxllcct urolc branch nanrcs
than Q ar.rclX.

lnstcad, wc considcr l0br-ickctsandalrashllrrctionthatconrpritestlreslttl-to1.tne


key. thcn returns the sttni nrodulo thc
binarY rePlcscntations oi tirc characters of a
ttumbcr o1'bucltcts.
buclet

buc!.er 0

lior brauch namc 'PcrrYriclgc'


:
Iluckct ut-r:/t(Pcrlyridgc) 5

Perqrrdle

A-lcl

9!JL,

Perrurdz*

r0(

Pernr:d:e

For branch naurc 'I{ottnd l{ill'


:
Buckct no:ft(ltor-rnc1 I{iil) 3
Irol braucit ttamc 'llrightou'
Ilurcltct no'-'ft(liriglttotl)'= 3

iTJCI

buc}er 6

bucLet

Ir..--,.

':,

Lu.-tet

-1

m
buci.et 9

butlet {

fuiirocd

l -1

1og

as tlrc kcr
l.-ig. llaslr crr.ganizatioit of trc'"0urtt tllc. rvith ht'LItlcli-ti.ttlia

llandling of Buclict Ovcrflo'tvs


cr]ough
Iu case ol inscrtion, if the buckct docs not havc

space, a bucket oYcrl'lurr is salJ to


fbr two reasons:
occLlr. Bucket overllow can occllr lnainly
bc chosen such thal fl11 z n7f"
Insufficicnt bucl<cts. firc number of buckets n3 flLlst
i.lcnotcs tlic
numbcr of recorcis that will be storccl an(l f r

wlicre r?,. clcnotcs the total


nunbcr of rcgords that will fit in a buckct'
otirers, so a bllckct may
Sligv. Sourc br-rckets are assigned morc rccorcls than are
Ti'ris situation is callcd btrckct
ovcrllow evcn whcu othcr burckets still havc spaicc.
sl<crv. Skcrv call occllf 1'or trvcl rclr$olls:
i. Multiplc rccords uray havc thc satrle scarch kcY.
o1' scarrch
ii. 'fhe chosen hash iunctioll rnay resuit in non-unilonn distribittion
keys.
lrut:kt:t,,

Wc linrtillc bucliet ovcrllow by u'ing

[::

LrrcLer
ovcrllow buckcts. 11 a record lnust be inserted
'[_-]-[--_j--'l
into a bucket b, and b is alrcacly lull' the
rrvulilrr\r' llLtt kct:
and brrcke t
b'
lbr
br-rckct
ovcrilow
an
provides
systc,l
If
inscrts il-tc ,"oor.l irlto thc ovct'llow bucket'
thc ovcrllorv br:ckct is also l'ull' thc systcur brckerr[::]

-l
.-t
i\rl [rl)r kct

i0

tll' lt 1r'ivcll l'rttt'iit't ltt't'


hrrcl<ct' lttltl st'r rln. All lhc ovcr[]<lrl' lrtrcl<cls
ovt:rllor,v
llr()vi(lcs
tist is callcd ovcrllow
list. Ovcrllorv hanclling ttsing srtch a lirlkccl
e haitrccl togcthcr in a-tinltcd
chaiuing.
illl()1 hcr

Dill'crcncc bctwecn opcn alttl closcd hashing:


ckrscd
varues in same buoket (in
always places keys with same hash f-uncrion

iillili;rhi'g

ovcrl'low btrckcts also)'

-lfbuckctirltlt,ti-rcsystctninsertsrccordsinoverflor'vbuckets'
- Diil'ercnt buckets cerri be oi difl'ercnt sizcs'

Ovcrllolv bucltcts arc iitrliccl togcthcr'

0pcn hirshing:
- OPcn liashing
[rtrckct is lirll'

Ilash

Placcs hcys

with samc lrirsir l-unction valucs in

dil'l'crcr-rt br"rcl<ct

il'

chain
Sct o1'buckets is {lxcd there is no overllow
Delelion is dillicult in opcr-i hashing.

lndices

only lbr l'rle organizati on. but also 1br inclcx-structurca


not n.r'
uscci ,.^*
bc -,..^.tr
call r,^
witrr thci'associatccl poi'tci's' i'to
crciiti.n. A l-rash i'clcx org.rlizcs th" ,*ar"h k*eys,
hash filc structurc'
hash function on et search kcy to
we construct a l.riish index as lbllows. We apply a
pointers in thc bucket (or in
idcntili,a buckct. and store tl-,"-i."y ancl its associated hash index otr thc uccttttttl
o'crllo* U".i.irl. Follo* i'g ligriic sirou's a seco'dary
lllc, lbr thc scarch kcy ctccottttt-number'
I Iasirir-rg

r11s[1't

ll

Br iglt ttrrr

A-l lr)

f)otvtttolvtt
Dot\'tlt()\\'li

P,:

rryrid gt'

Pct-rvriclg,.'

A-

lt)l

lirrtrntl

l-l

rll

dccount-nunlber of account hle


I,'ig. l'iash index on scarch kcy

11

orthc-.cco..t'u'r'c'

lirnctio. in t'c rigLrrc co'r'utcs.trrc su'rof


cac'of size-z. onc of
nroduro 7",l.hclash indcx l.,or r.u"ir-buckcts,

.r.rrc

liasrr

thc crigits

rl-rc br-rckcts has

ovcrllow buckct'
thrcc kcys mappccl to it, so it has an

t')

Вам также может понравиться