You are on page 1of 5

3/26/2016

Hashing

ConceptofHashing
Introduction

Theproblemathandsistospeedupsearching.Considertheproblemofsearchinganarray
foragivenvalue.Ifthearrayisnotsorted,thesearchmightrequireexaminingeachandall
elementsofthearray.Ifthearrayissorted,wecanusethebinarysearch,andtherefore
reducetheworsecaseruntimecomplexitytoO(logn).Wecouldsearchevenfasterifwe
knowinadvancetheindexatwhichthatvalueislocatedinthearray.Supposewedohave
thatmagicfunctionthatwouldtellustheindexforagivenvalue.Withthismagicfunction
oursearchisreducedtojustoneprobe,givingusaconstantruntimeO(1).Suchafunction
iscalledahashfunction.Ahashfunctionisafunctionwhichwhengivenakey,
generatesanaddressinthetable.

Theexampleofahashfunctionisabookcallnumber.Eachbookinthelibraryhasauniquecallnumber.Acallnumberislikeanaddress:it
tellsuswherethebookislocatedinthelibrary.ManyacademiclibrariesintheUnitedStates,usesLibraryofCongressClassificationforcall
numbers.Thissystemusesacombinationoflettersandnumberstoarrangematerialsbysubjects.
Ahashfunctionthatreturnsauniquehashnumberiscalledauniversalhashfunction.Inpracticeitisextremelyhardtoassignunique
numberstoobjects.Thelaterisalwayspossibleonlyifyouknow(orapproximate)thenumberofobjectstobeproccessed.
Thus,wesaythatourhashfunctionhasthefollowingproperties
italwaysreturnsanumberforanobject.
twoequalobjectswillalwayshavethesamenumber
twounequalobjectsnotalwayshavedifferentnumbers
Theprecedureofstoringobjetsusingahashfunctionisthefollowing.
CreateanarrayofsizeM.Chooseahashfunctionh,thatisamappingfromobjectsintointegers0,1,...,M1.Puttheseobjectsintoan
arrayatindexescomputedviathehashfunctionindex=h(object).Sucharrayiscalledahashtable.

Howtochooseahashfunction?OneapproachofcreatingahashfunctionistouseJava'shashCode()method.ThehashCode()methodis
implementedintheObjectclassandthereforeeachclassinJavainheritsit.Thehashcodeprovidesanumericrepresentationofanobject(this
issomewhatsimilartothetoStringmethodthatgivesatextrepresentationofanobject).Considethefollowingcodeexample
Integerobj1=newInteger(2009)
Stringobj2=newString("2009")
System.out.println("hashCodeforanintegeris"+obj1.hashCode())
System.out.println("hashCodeforastringis"+obj2.hashCode())

Itwillprint
hashCodeforanintegeris2009
hashCodeforastringis1537223

ThemethodhasCodehasdifferentimplementationindifferentclasses.IntheStringclass,hashCodeiscomputedbythefollowingformula

https://www.cs.cmu.edu/~adamchik/15-121/lectures/Hashing/hashing.html

1/5

3/26/2016

Hashing

s.charAt(0)*31n1+s.charAt(1)*31n2+...+s.charAt(n1)

wheresisastringandnisitslength.Anexample
"ABC"='A'*312+'B'*31+'C'=65*312+66*31+67=64578

NotethatJava'shashCodemethodmightreturnanegativeinteger.Ifastringislongenough,itshashcodewillbebiggerthanthelargestinteger
wecanstoreon32bitsCPU.Inthiscase,duetointegeroverflow,thevaluereturnedbyhashCodecanbenegative.
ReviewthecodeinHashCodeDemo.java.

Collisions
Whenweputobjectsintoahashtable,itispossiblethatdifferentobjects(bytheequals()method)mighthavethesamehashcode.Thisiscalled
acollision.Hereistheexampleofcollision.Twodifferentstrings""Aa"and"BB"havethesamekey:.
"Aa"='A'*31+'a'=2112
"BB"='B'*31+'B'=2112

Howtoresolvecollisions?Wheredoweputthesecondandsubsequentvaluesthathashtothis
samelocation?Thereareseveralapproachesindealingwithcollisions.Oneofthemisbasedon
ideaofputtingthekeysthatcollideinalinkedlist!Ahashtablethenisanarrayoflists!!This
techniqueiscalledaseparatechainingcollisionresolution.

Thebigattractionofusingahashtableisaconstanttimeperformanceforthebasicoperationsadd,remove,contains,size.Though,
becauseofcollisions,wecannotguaranteetheconstantruntimeintheworstcase.Why?Imaginethatallourobjectscollideintothesame
index.Thensearchingforoneofthemwillbeequivalenttosearchinginalist,thattakesalinerruntime.However,wecanguaranteean
expectedconstantruntime,ifwemakesurethatourlistswon'tbecometoolong.Thisisusuallyimplemntedbymaintainingaloadfactorthat
keepsatrackoftheaveragelengthoflists.Ifaloadfactorapproachesasetinadvancedthreshold,wecreateabiggerarrayandrehashall
elementsfromtheoldtableintothenewone.
Anothertechniqueofcollisionresolutionisalinearprobing.Ifwecannoitinsertatindexk,wetrythenextslotk+1.Ifthatoneisoccupied,
wegotok+2,andsoon.Thisisquitesimpleapproachbutitrequiresnewthinkingabouthashtables.Doyoualwaysfindanemptyslot?What
doyoudowhenyoureachtheendofthetable?

HashSet
Inthiscoursewemostlyconcernwithusinghashtablesinapplications.JavaprovidesthefollowingclassesHashMap,HashSetandsomeothers
(morespecializedones).
HashSetisaregularsetallobjectsinasetaredistinct.Considerthiscodesegment
String[]words=newString("Nothingisaseasyasitlooks").split("")
HashSet<String>hs=newHashSet<String>()
for(Stringx:words)hs.add(x)
System.out.println(hs.size()+"distinctwordsdetected.")
System.out.println(hs)

Itprints"6distinctwordsdetected.".Theword"as"isstoredonlyonce.
HashSetstoresandretrieveselementsbytheircontent,whichisinternallyconvertedintoanintegerbyapplyingahashfunction.Elements
fromaHashSetareretrievedusinganIterator.Theorderinwhichelementsarereturneddependsontheirhashcodes.
ReviewthecodeinHashSetDemo.java.
ThefollowingaresomeoftheHashSetmethods:
set.add(key)addsthekeytotheset.
set.contains(key)returnstrueifthesethasthatkey.
set.iterator()returnsaniteratorovertheelements

Spellchecker
Youareimplementasimplespellcheckerusingahashtable.Yourspellcheckerwillbereadingfromtwoinputfiles.Thefirstfileisa
dictionarylocatedattheURLhttp://www.andrew.cmu.edu/course/15121/dictionary.txt.Theprogramshouldreadthedictionaryandinsertthe
wordsintoahashtable.Afterreadingthedictionary,itwillreadalistofwordsfromasecondfile.Thegoalofthespellcheckeristodetermine
https://www.cs.cmu.edu/~adamchik/15-121/lectures/Hashing/hashing.html

2/5

3/26/2016

Hashing

themisspelledwordsinthesecondfilebylookingeachwordupinthedictionary.Theprogramshouldoutputeachmisspelledword.
SeethesolutionhereSpellchecker.java.

HashMap
HashMapisacollectionclassthatisdesignedtostoreelementsaskeyvaluepairs.Mapsprovideawayoflookinguponethingbasedonthe
valueofanother.

WemodifytheabovecodebyuseoftheHashMapclasstostorewordsalongwiththeirfrequencies.
String[]data=newString("Nothingisaseasyasitlooks").split("")
HashMapString,Integer>hm=newHashMapString,Integer>()
for(Stringkey:data)
{

Integerfreq=hm.get(key)

if(freq==null)freq=1elsefreq++

hm.put(key,freq)
}
System.out.println(hm)

Thisprints{as=2,Nothing=1,it=1,easy=1,is=1,looks=1}.
HashSetandHashMapwillbeprintedinnoparticularorder.Iftheorderofinsertionisimportantinyourapplication,youshoulduse
LinkeHashSetand/orLinkedHashMapclasses.Ifyouwanttoprintdtatainsortedorder,youshoulduseTreeSetandorTreeMapclasses
ReviewthecodeinSetMapDemo.java.
ThefollowingaresomeoftheHashMapmethods:
map.get(key)returnsthevalueassociatedwiththatkey.Ifthemapdoesnotassociateanyvaluewiththatkeythenitreturnsnull.
Referringto"map.get(key)"issimilartoreferringto"A[key]"foranarrayA.
map.put(key,value)addsthekeyvaluepairtothemap.Thisissimilarto"A[key]=value"foranarrayA.
map.containsKey(key)returnstrueifthemaphasthatkey.
map.containsValue(value)returnstrueifthemaphasthatvalue.
map.keySet()returnsasetofallkeys
map.values()returnsacollectionofallvalue

Anagramsolver
Ananagramisawordorphraseformedbyreorderingthelettersofanotherwordorphrase.Hereisalistofwordssuchthatthewordsoneach
lineareanagramsofeachother:
barde,ardeb,bread,debar,beard,bared
bears,saber,bares,baser,braes,sabre

Inthisprogramyoureadadictionaryfromthewebsiteathttp://www.andrew.cmu.edu/course/15121/dictionary.txtandbuildaMap()whose
keyisasortedword(meaningthatitscharactersaresortedinalphabeticalorder)andwhosevaluesaretheword'sanagrams.

https://www.cs.cmu.edu/~adamchik/15-121/lectures/Hashing/hashing.html

3/5

3/26/2016

Hashing

SeethesolutionhereAnagrams.java.

PriorityQueue
Weareoftenfacedwithasituationinwhichcertainevents/elementsinlifehavehigherorlowerprioritiesthanothers.Forexample,university
courseprerequisites,emergencyvehicleshavepriorityoverregularvehicles.APriorityQueueislikeaqueue,exceptthateachelementis
insertedaccordingagivenpriority.Thesimplestexampleisprovidedbyrealnumbersandorrelationsoverthem.Wecansaythatthe
smallest(orthelargest)numericalvaluehasthehighestpriority.Inpractice,priorityqueuesaremorecomplexthanthat.Apriorityqueueisa
datastructurecontainingrecordswithnumericalkeys(priorities)thatsupportssomeofthefollowingoperations:
Constructapriorityqueue
Insertanewitem.
Removeanitem.withthehighestpriority
Changethepriority
Mergetwopriorityqueues
Observethatapriorityqueueisapropergeneralizationofthestack(removethenewest)andthequeue(removetheoldest).

ElementaryImplementations
Therearenumerousoptionsforimplementingpriorityqueues.Westartwithsimpleimplementationsbasedonuseofunorderedorordered
sequences,suchaslinkedlistsandarrays.Theworstcasecostsofthevariousoperationsonapriorityqueuearesummarizedinthistable

insert deleteMin remove findMin merge

orderedarray

orderedlist

unorderedarray 1

unorderedlist 1

Lateroninthecoursewewillseeanotherimplementationofapriorityqueueubasedonabinaryheap.

ComparableandComparatorinterfaces
TheComparableinterfacecontainsonlyonemethodwiththefollowingsignature:

publicintcompareTo(Objectobj)

Thereturnedvalueisnegative,zeroorpositivedependingonwhetherthisobjectisless,equalsorgreaterthanparameterobject.Notea
differencebetweentheequals()andcompareTo()methods.Inthefollowingcodeexamplewedesignaclassofplayingcardsthatcanbe
comparedbasedontheirvalues:
classCardimplementsComparable<Card>
{
privateStringsuit
privateintvalue
publicCard(Stringsuit,intvalue)
{
this.suit=suit
this.value=value
}
publicintgetValue()
{
returnvalue
}
publicStringgetSuit()
{
returnsuit
}
publicintcompareTo(Cardx)
{
returngetValue()x.getValue()
}
https://www.cs.cmu.edu/~adamchik/15-121/lectures/Hashing/hashing.html

4/5

3/26/2016

Hashing

ItisimportanttorecognizethatifaclassimplementstheComparableinterfacethancompareTo()andequals()methodsmustbecorrelatedina
sensethatifx.compareTo(y)==0,thenx.equals(y)==true.Thedefaultequals()methodcomparestwoobjectsbasedontheirreference
numbersandthereforeintheabovecodeexampletwocardswiththesamevaluewon'tbeequal.Andafinalcomment,iftheequals()methodis
overridenthanthehashCode()methodmustalsobeoverriden,inordertomaintainthefollowingproperety:ifx.equals(y)==true,then
x.hashCode()==y.hashCode().
Supposewewouldliketobemoreflexibleandhaveadifferentwaytocomparecards,forexample,bysuit.Theaboveimplementationdoesnt
allowustodothis,sincethereisonlyonecompareTomethodinCard.Javaprovidesanotherinterfacewhichwecanbeusestosolvethis
problem:
publicinterfaceComparator<AnyType>
{
compare(AnyTypefirst,AnyTypesecond)
}

Noticethatthecompare()methodtakestwoarguments,ratherthanone.Nextwedemonstratethewaytocomparetwocardsbytheirsuits,This
methodisdefinedinitsownclassthatimplementsComparator:
classSuitSortimplementsComparator<Card>
{
publicintcompare(Cardx,Cardy)
{
returnx.getSuit().compareTo(y.getSuit())
}
}

ObjectsthatimplementtheComparableinterfacecanbesortedusingthesort()methodoftheArraysandCollectionsclasses.Inthefollowing
codeexample,werandomlygenerateahandoffivecardsandsortthembyvalueandthenbysuit:
String[]suits={"Diamonds","Hearts","Spades","Clubs"}
Card[]hand=newCard[5]
Randomrand=newRandom()
for(inti=0i<5i++)
hand[i]=newCard(suits[rand.nextInt(4)],rand.nextInt(12))
System.out.println("sortbyvalue")
Arrays.sort(hand)
System.out.println(Arrays.toString(hand))
System.out.println("sortbysuit")
Arrays.sort(hand,newSuitSort())
System.out.println(Arrays.toString(hand))

Objectscanhaveseveraldifferentwaysofbeingcompared.Hereisanotherwayofcomparingcards:firstbyvalueandifvaluesarethesame
thenbysuit:
classValueSuitSortimplementsComparator<Card>
{
publicintcompare(Cardx,Cardy)
{
intv=x.getValue()y.getValue()
return(v==0)?x.getSuit().compareTo(y.getSuit()):v
}
}

VictorS.Adamchik,CMU,2009

https://www.cs.cmu.edu/~adamchik/15-121/lectures/Hashing/hashing.html

5/5