Академический Документы
Профессиональный Документы
Культура Документы
rtrrcl I lashing
lndcx
A datat-rasc incicx is n clata s1r'uclurc that inrprovcs tltc spccd ol'data rctrieval opclations on
databersc tablc at thc cost ol'slowcr writes ancl thc use of'more storage space.
lt
llasic Concepts
only a srlall proportion of thc rccorcls in a ljlc. Ior cxanrplc, a clLicrv
all accounts at thc Pcrryriclgc bruulch" or "Find the balance o1'account nrrurbcr;\l()1" rclbrcnccs only u fl'lction ol'the accr'rlu-rt rccorcls. It is iucl'l'lcicnt tbr thc s1'stcil to rcatl
cvcr\/ r'ccol'cl lrnrl io cltccli l,lrc hrttttclt-uutna llclcl lirr llic nr.rntc "l)cn1,1'l1ga," or tltc (iL'(()tt)t'tttrntber llcld lbr thc value A-101. lclcally. tltc syslenr should be ablc to iocatc thcsc rcc()r'ds
clircctlr,. 'l'o allow thcsc lornrs 01'ac:ccss, we design adclitional structurcs that u'c .rssocirtc
u'ith filcs^
N4auv clucrics rclcrcncc
lil<c "Fincl
An inclcx lbr a lllc in a databasc systcu worlis in muc,h thc sarnc way as thc irrdcx in this
tcxtbooh. ll-lvc r.vant to lcarn about a 1-rarticular topic, wc cern scarch Ibr thc topic in thc inclcx
at 1hc back of tirc book. llnd thc pagcs rvhcre it occurs, and then read the pagcs to find lhc
inlbrrnation we arc looking lbr. The words in the indcx are in sorted ordcr, making ii casv lo
llncl thc r,vorcl w'c arc loolting {ilr. N4orcovcr. i}rc inclcx is uruch smaller ihan thc br-rok. lurtirur'
lcclucing thc cllbrt necclccl to lhcl thc ivorcls we arc looking lbr.
l)atabase systcrr indiccs play the saure roie as boolt indices or card catalogs in libraries. I;or
cxirrrilllc. to rctricvc Lu1 (tccorutl rccorcl givcn thc account numbcr, thc databasc s),slcirl rvoi,rlrl
l,'.'k u:'r.ur ittJcr io linil e'n nirich ciisli block thc corrcsponding rccord resiclcs. and llrcn lcteh
tirc .iisk biock. to gct Lltc uccottttl rccold.
Kccping a soltccl list ol-accoLtttt numbcrs woulcl not worl< wcll on vcry lalgc databa:;cs vuith
nrillions ol'uccounts, sitrcc 1hc inclcx rvoulcl itsclf bc vcry big; Iirrthcr cvcn thoLigh l<ccprng
thc inclcx .sortcd lccluccs thc scurch tinrc, Ihcling an accoLrnt can still bc rathcr tintcconsunriug. lnstcacl. nrorc sopiristicaicd indcxing techniclucs rnay bc used.
.
.
Inscrtion timc: 'l'lrc tirnc il lalics to iuscrl n ncrv dala itcrl.'l'his vrlrrc inclrr'.le: iir,'
tiurc it tulics to ljrtl thc corlcct place trl iuscrt tlrc ncw clata itcn-r, as wcll us tlre tirirc il
tal<es to uptlate 1hc inclcx strurcturc.
Dclction tinrc: 'l'hc tiurc it talics to clclctc a clata itcm.'l'his valuc inclLrdcs thc tirre rl
takcs to llnd thc itcm to be elelctcd. as well as the tirnc it takes to updatc thc incicx
stnrctuLc.
Spacc ovcrhcad: l'hc additional space occupied by an index structure. Providcd that
thc arnount cll- aclclitional spacc is rnodcratc, it is usuaily worthwhilc to saclilrcc thc
spauc tr,l achicvc intprovcd pcrlbrtttarrce .
Scarch l(cy'
An attribr-rtc or sct ol'attributcs uscd to look up rccords in
file is called
scarch licy.
Ortlcrcd Intliccs
'l'o gliin litst mnilttur access lo lccorrls in a lilc, wc can Lrsc aln inclcx structLrrc. ljacir inclcx
slnrctlrrc is associittccl rvith a particular scarch key. An ordered index storcs the valr.rcs o1'thc
scarch keys in solted ordcr, and associatcs with eacir scarch key tl-re rccords that contairr it. A
lilc may have several indices. on dilkrent scarch keys.
l'rilnarl' (clustcrcd) Iurlcx: i1' thc lrlc containing thc t'ccords is sc.luer'.i:r..'.
prin'rarv iirelcx is an inclcx rvhosc scalch kc1'also dcllncs'rhe scc'.:c:r:j.:l c::--:
l'ritrarv inclices arc also callcd clrrstcrius itidiccs.
l-hcrc arc
clcrrsc primary inclcx, tl-rc inclcx rccorcl contains thc scarch-l<cy valuc ancl u pointer tcr
thc llrst clata rccord with that scarch-l<cy valuc.'l-hc rcsl ol'thc rccorCs with tlrc sarll
scarch licy-r'aluc lvould bc storcd scqLrcntially al'tcr thc lllst lccorcl, sincc. bccrLrse llic
indcx is a primary onc. rccords arc sortcd on the samc scarch kcl'.
A-t0l
Blich ti)rr
f)ori'n torr,rl
500
A-110
f)ctlrrrt tot,r'rr
600
A- '15
N.'[iarti-rs
700
A-102
Perlyricl
ger
,100
A-201
Pelrvricl qe
c)00
A-218
Pen'vrirl
700
A-222
iledu,ooc{
A-305
l(otrncl
Briglr tt,rr
A-217
Dcrtr'rt Lolr'tt
Nli.trtrrs
Prrl r'\r ri
cl
.e
Rerl t.r'ooil
Rotrrrtl
Hill
-=
t\
\-
qer
Hill
754
l)
701)
350
Sparsc Intlcx: An inclcx recorcl appeal's lbr only sonte ol'thc scarch-kcy valLrcs. I:ach
index rccord contains a search-key value and a pointer 1o the first data rccord r.vith
that scarch-key value" To locate a record, we find the indcx entry with the largcst
scarch-key value that is less tl-ian or equal 1o the scarcli-key valuc ior which \vc arc
looking. We start at tlic t'ccord pointcd to by that indcx entry, and lbllow thc poinlcrs
in llrc fllc tnttil n,c lnttl thc tlcsil'ctl rccol'tl.
lSri.girtori
lJtighton
Vliantrs
I-)or,r'niLrlvn
Redt".rrood
L)o\v'nlo\,\'n
lVlianrLs
Pcrryrirlge
Pcrrvri.ige
Penyridge
A-222 | liercln'oor{
Iioum.{
I{ill
Comparison:
It is generaily i'astcr 1o locate a rccord if we l-ravc a dense index rather than a sparsc incicx.
I lowcvcr, sparsc indiccs havc advantagcs over dcnse indiccs in that they rcqr-rire lcss spacc
and they impose less maintenance ovcrhcad lbr insertions ancl delctions.
'l-hclc is a lraclc-olf that tl-ie systcr-r-r dcsigner uust n-iakc bctwcen acccss timc and spacc
.',.:::,'*.:" .i.:i;.ri;:.;;.;,-'i.-ci-'io;r icgardin-u, this tradc-o11 depcnds ott thc sltccil-ic apllliclrttorl.
a good compromise is to have a sparse index with one index entry per block. The rcason this
dcsign is a good trade-off is tl"rat the dominant cost in processing a database reqllcst is thc
tinrc that it tahes to bring a block liom clisi< into main memory. Once rve ltavc brought in the
block, thc tirlc 1o scan thc cntirc block is ncgligible. Using tiiis sparsc indcx, r.r'c Iocatc thc
block coirtaining thc rccord that we arc sccking. fhus, unless the recorcl is on an ovcrllow
block, rve minimizc block accesscs r,vhilc kceping tl-re size of thc index as small as possiblc.
N{ulti-Lcvcl Indiccs
- If primary indcx does not l'rt in merlory, access bccot-ucs cxpctrsive.
- Solution: treert printary indcx kcpt on disk as a sequenlial file and constrlrct a spi-rrsc
indcx or-r il.
' outcr inclcx -- it sllarso irrdcx of prin-raly indcx
' inncr inclcx - thc prirlary indcx lilc
- ll evcn ollter inclex is too large to fit in main memory, yet auothel level ol'itlclcx catl
bc clcatcd. and so on.
Incliccs at nlI lcvcls lt.tust bc upc'latccl ott inscrtioti ttr dcletion liorn thc filc.
ittr.lcr'
Lrltrc"l.
(.1.i
I
1"1
l-)l()cl(
An Exaltplc: Colsider 100,000 Lecords, 10 per block, at one index record per block.
tSat's 10.000 ipclcx lccolds. l:vcn ilwc can ilt 100 index records pcr bloclt- this rs 100
blocks. If indcx is too largc to bc kcpt in urain lllemor)-. a search rcsults in scicrli
disk rcatls.
a
t
a
howcvcr.
poi't t''itrst
a ca'cliclatc kcy' it is not crroug'to
n.t
is
i'clcx
scconcrary
or.u
l|ir. ,",-'nir-ri'g rccorcrs witrr the sarrre se'rch-kc1'
I.thc sc*r.crr kcy
uui.,".
kcy
rhc rrrst rccord witrr cacrr scar,:i-,-kcy
rccorcls arc orclcrcd by t'c scarch
thc
'1't'c
rirt,
'ortn"
tn.
.,ny*ir.,.lr,
rr.
valuc co*lcl
a sccondarv
'inct
inclex"fhercfbrc'
secotrdary
i'clex, rurit.l,ir* iy rhc r.u'"rtittf
to ali thc rccorcis'
iuCcx tnust colltaln poiutcrs
'rimary
Complrison (Prinrary vs' Sccondnry):
because records in the tlie
r A seclucntial ,1"1l i" primary'index orclcr is efficient
.
r
arc
ililS\\
searclr
why might thcy not be kcpt on sevcral
p.rocessitrg,
qllefy
specd
Q. 2: Sirrce irrdices
as possiblc'
l.cys? List as mally reasolls
scvcral scarch inciiccs includc:
inscrts ancl
Ansu'er: llcasons {br not kceping
and clisk i/o overhcacl cluring
i. livcry ir-rdex rcquircs aciditional?PU tin.,.
o' *pdarcs'
e'l1cicrrcv migrrt *o'| bc
il *:':,:::i:JTHi:l,THJ:il;H:ii:f:ili,T::*c,r,icvs,
on the'-r. 'r'heretbrc
oniy some or, rh;^l;;y, have i'diccs
baci cvcn
if
b;t;;ittg
darabasc
cxisi'
incliccs whcn many incliccs aircacly
e.3: Is it possiblc i,
clil-lcrcnr
scarch l<cYs.
t
'
"
r
.
n
r
.
'
A lcal'noclc lus
,P,,
'l'lrc
Don,rttorvn
i ttorlt'
Briglilcirr
A-101
Dotr,ttIoir,i',
Dotvrttorr'rr
Jqa,rLIlll tiir'
;
a
a
Sincc tlrc account lllc is ordcrcd by branch-nume, the pointcrs in thc lcaf noclc poirit
diLcctly to 1hc Iilc,
'l'hc noulcai
noclcs of the ll'l--tree l'orin amultilcvcl (sparsc) index on thc lcal'noel*;.
The structr.tre of nonieaf nodes is the same as tirat ibL leal' nocics, except lhat all
pointcls arc pointcrs to trcc nocles.
A ttonlcal'nodc tlay hold up to u pointcrs, and lnust itolci at least [ni2l pointcrs.
'l'hc nr.rmbcr o{'pointcls in a nodc
is callcd thc lanout ol'thc noclc.
'l'hc r()(it
11()clc
il
nrttst ltoltl
lt
lr"rtst
lrltr
poilltcrs.
i.
ii.
iii
Ansrve
i.
Four
Six
Digltt
r:
For order 4 (n=4)
19
\-1\
ii.
\t
7
11
29
rfl
\
19
LJ
29
JI
ll'ilcs
li-'l'rcc Intlcx
1]-ilce inelicr;s
$l* sitlliltlr
1o i]-i'-1rce
rAg.n.,oli,"dl]-troclcafnoclcatrclancln-lcafnoclcappcariirllig.aandllig.b
resPcctivelY'
trcc
In uonleaf nodes' the pointers P1 arc thc
[,eaf nocles arc the surmc as iu Bl--trees'
filc-rccold
or
*ftlit the pointers Bi aie buckct
poi'tcrs that q,c usccl also lbr B+ -trcc',
leaf node'
*
the figure' tliere arc n 1 keys in the
i'
g*,',.rJir*all-rr..
t5c
11
pointcrs,
lronlcal'
n,rdJ, This ciiscrcirancy occitrs-bccaruc
but therc atc u1- I hcys in thc nonlcaf
cirn bc
thitt
thc nr'r'rber of searclt ltcys
must incluclc pointcrs l);, tirus '"tit'ti'-tg
'ocles
hcld in these nodcs.
le
af node'
.Non.lcal.nodesarelarger,solan-otrtisledr,rced..flrus.B-Treestypicallyhar'egreater
flcc
clcpth thtlu corrcslloncling IJ'-
.
r
r110r.c
/
of licy valttes:
Q. 5: ConstrLlcl a lJ-trcc lbr the lbilowing sct
(2, 3, 5, 7, 11, 17, 19, 23, 29, 31)
ordcr. Cotrstruct
Assumc that thc tr.cc is initially cnipty ancl valucs arc addcci iu zrscencling
is
as lbllorvs:
nodc
rvill
onc
llt
in
I] t-trccs lbr t5c cascs whcLc thc nunibcr ol'pointcrs lltat
i. lrour
ii. Six
iii. Ilight
Artsu'cl':
i. For orcicr 4 (n:4)
l[:rshing
- O1e clisadvaltagc of scqucntial lile organization is that wc must usc ett.t it.tcicx
illlorv Lrs
structure to locaic data. File orgar-rizations based on the technique of hashing
to ar,oicl acccssipg ap incicx stmcttue. Ilashing also provides a way of constrLrctitig
:...|
IIIUiLLJ.-..
- irile organizations based on hashing allow us to find the address of a data item directly
by coniputing a fur-rctiol.l ou thc search-key value of the desircd record,
- Slncc'we doiot linow at design tirne prcciseiy r,vhich search-kcy values will bc storcd
i,' thc file, a good hash liinction to choose is onc that assigns scarch-kcy valucs to
buckcts such that thc distribrition is both uniform and random'
I Iash
lrilc Organization
In a hash file organization, we obtain the adcircss of the disk block" also calicd thc
buckct containing a desired rccorcl directly by computing a function on the scarch-kc-vvaluc ol'thc rccord.
Lct K clenore the set of all scarch-kcy values, and let B denote the set of all bucket
a hash lr-rnctiorr'
adclrcsscs. A hash function /z is a tlnction from I{to B. Let ft denote
'l'o inscrt a record with scarch kcy Ki, we colnpute h(Ki), which gives thc aclcircss ol'
to storc
thc br.rckct lbr that rccorcl. Assumc I'or now thal thcrc is spllcc in the bucket
titc rccord. I'hcn, thc rccot'tl is storcd irl that buckct'
'l'o pcrtbmr a lookup on a scaLch-kcy valuc Ki, wc sirrrply cottrputc h(Ki), thctl scarch
/(z' havc thc satttc
tirc buckct rvith thai acich'css. Suprposc lhat two scarolt kcys, Ks ernd
tiic bLrckct h(/(s)
:
hash valr,rc; that is, h(Ks) h(i<) If wc pcrform a lookup on Ks,
valLrcs Kr'
contains records with search-kcy valucs K5 and records with seaichkey
to Vcrily'
bucket
the
in
"l'6us, we iravc to chcck 1hc search-key value of every iecord
that thc rccord is cltlc that wc u':rut'
to bc clclctccl
Dclctiop is ccprally slraiglrtibnvalci. lf thc scarch-kcy value o1'thc rccord
rccorcl, tttrcl
that
fbr
is Kl, we colnpute h(l(i), tircn searQh tire corresportding btrckct
buc!.er 0
Perqrrdle
A-lcl
9!JL,
Perrurdz*
r0(
Pernr:d:e
iTJCI
buc}er 6
bucLet
Ir..--,.
':,
Lu.-tet
-1
m
buci.et 9
butlet {
fuiirocd
l -1
1og
as tlrc kcr
l.-ig. llaslr crr.ganizatioit of trc'"0urtt tllc. rvith ht'LItlcli-ti.ttlia
[::
LrrcLer
ovcrllow buckcts. 11 a record lnust be inserted
'[_-]-[--_j--'l
into a bucket b, and b is alrcacly lull' the
rrvulilrr\r' llLtt kct:
and brrcke t
b'
lbr
br-rckct
ovcrilow
an
provides
systc,l
If
inscrts il-tc ,"oor.l irlto thc ovct'llow bucket'
thc ovcrllorv br:ckct is also l'ull' thc systcur brckerr[::]
-l
.-t
i\rl [rl)r kct
i0
iillili;rhi'g
-lfbuckctirltlt,ti-rcsystctninsertsrccordsinoverflor'vbuckets'
- Diil'ercnt buckets cerri be oi difl'ercnt sizcs'
0pcn hirshing:
- OPcn liashing
[rtrckct is lirll'
Ilash
Placcs hcys
dil'l'crcr-rt br"rcl<ct
il'
chain
Sct o1'buckets is {lxcd there is no overllow
Delelion is dillicult in opcr-i hashing.
lndices
r11s[1't
ll
Br iglt ttrrr
A-l lr)
f)otvtttolvtt
Dot\'tlt()\\'li
P,:
rryrid gt'
Pct-rvriclg,.'
A-
lt)l
lirrtrntl
l-l
rll
11
orthc-.cco..t'u'r'c'
.r.rrc
liasrr
thc crigits
ovcrllow buckct'
thrcc kcys mappccl to it, so it has an
t')