Академический Документы
Профессиональный Документы
Культура Документы
Problem
RT&Tisalargephonecompany,andtheywanttoprovideenhanced
callerIDcapability:
givenaphonenumber,returnthecallersname
phonenumbersareintherange0toR=10 101
nisthenumberofphonenumbersused
wanttodothisasefficientlyaspossible
Weknowtwowaystodesignthisdictionary:
abalancedsearchtree(AVL,redblack)oraskiplistwiththephone
numberasthekeyhasO(logn)querytimeandO(n)spacegoodspace
usageandsearchtime,butcanwereducethesearchtimetoconstant?
abucketarrayindexedbythephonenumberhasoptimalO(1)querytime,
butthereisahugeamountofwastedspace:O(n+R)
AnotherSolution
AHashTableisanalternativesolutionwithO(1)expectedquerytimeandO(n+N)
space,whereNisthesizeofthetable
Likeanarray,butwithafunctiontomapthelargerangeofkeysintoasmallerone
e.g.,taketheoriginalkey,modthesizeofthetable,andusethatasanindex
Insertitem(4018637639,Roberto)intoatableofsize5
4018637639mod5=4,soitem(4018637639,Roberto)isstoredinslot4ofthetable
Alookupusesthesameprocess:mapthekeytoanindex,thencheckthearray
cellatthatindex
Insert(4018639350,Andy)
Andinsert(4018632234,Devin).Wehaveacollision!
CollisionResolution
Howtodealwithtwokeyswhichmaptothesamecellofthearray?
Usechaining
Setuplistsofitemswiththesameindex
Theexpected,search/insertion/removaltimeis
O(n/N),providedtheindicesareuniformlydistributed
Theperformanceofthedatastructurecanbefinetunedbychanging
4
thetablesizeN
FromKeystoIndices
Themappingofkeystoindicesofahashtableiscalledahashfunction
Ahashfunctionisusuallythecompositionoftwomaps,ahashcodemapandacompressionmap.
Anessentialrequirementofthehashfunctionistomapequalkeystoequalindices
Agoodhashfunctionminimizestheprobabilityofcollisions
JavaprovidesahashCode()methodfortheObjectclass,whichtypicallyreturnsthe32bitmemoryaddressoftheobject.
ThisdefaulthashcodewouldworkpoorlyforIntegerandStringobjects
ThehashCode()methodshouldbesuitablyredefinedbyclasses.
PopularHashCodeMaps
Integercast:fornumerictypeswith32bitsorless,wecanreinterpretthe
bitsofthenuberasanint
Componentsum:fornumerictypeswithmorethan32bits(e.g.,longand
double),wecanaddthe32bitcomponents.
Polynomialaccumulation:forstringsofanaturallanguage,combinethe
charactervalues(ASCIIorUnicode)a0a1...an1byviewingthemasthe
coefficientsofapolynomial:
a0+a1x+...+xn1an1
ThepolynomialiscomputedwithHornersrule,ignoringoverflows,ata
fixedvaluex:
a0+x(a1+x(a2+...x(an2+xan1)...))
Thechoicex=33,37,39,or41givesatmost6collisionsonavocabularyof
50,000Englishwords
6
Whyisthecomponentsumhashcodebadforstrings?
PopularCompressionMaps
Division:h(k)=|k|modN
thechoiceN=2kisbadbecausenotallthebitsaretakeninto
account
thetablesizeNisusuallychosenasaprimenumber
certainpatternsinthehashcodesarepropagated
Multiply,Add,andDivide(MAD):h(k)=|ak+b|modN
eliminatespatternsprovidedamodN0
sameformulausedinlinearcongruential(pseudo)randomnumber
generators
MoreonCollisions
Akeyismappedtoanalreadyoccupiedtablelocation
whattodo?!?
Useacollisionhandlingtechnique
WeveseenChaining
CanalsouseOpenAddressing
DoubleHashing
LinearProbing
LinearProbing
Ifthecurrentlocationisused,trythenexttablelocation
linear_probing_insert(K)
if (table is full) error
probe = h(K)
while (table[probe] occupied)
probe = (probe + 1) mod M
table[probe] = K
Lookupswalkalongtableuntilthekeyoranemptyslotisfound
Useslessmemorythanchaining.(Donthavetostoreallthoselinks)
Slowerthanchaining.(Mayhavetowalkalongtableforalongway.)
Deletionismorecomplex.(Eithermarkthedeletedslotorfillintheslotby
shiftingsomeelementsdown.)
LinearProbingExample
h(k)=kmod13
Insertkeys:
31
44 32
41
73
18 44 59 32 22 31 73
10
LinearProbingExample(cont.)
11
DoubleHashing
Usetwohashfunctions
IfMisprime,eventuallywillexamineeverypositioninthetable
double_hash_insert(K)
if(tableisfull)error
probe=h1(K)
offset=h2(K)
while(table[probe]occupied)
probe=(probe+offset)modM
table[probe]=K
Manyofsame(dis)advantagesaslinearprobing
Distributeskeysmoreuniformlythanlinearprobingdoes
12
DoubleHashingExample
h1(K)=Kmod13
h2(K)=8Kmod8
wewanth2tobeanoffsettoadd
13
DoubleHashingExample(cont.)
14
TheoreticalResults
Let=N/M
theloadfactor:averagenumberofkeysperarrayindex
Analysisisprobabilistic,ratherthanworstcase
ExpectedNumberofProbes
Notfound
found
15
ExpectedNumberofProbesvs.
LoadFactor
16