Вы находитесь на странице: 1из 12

AnApplicationofPersonalizedPageRankVectors:

PersonalizedSearchEngine
MehmetS.Aktas1,2,MehmetA.Nacar1,2,andFilippoMenczer1,3
1

IndianaUniversity,ComputerScienceDepartment
LindleyHall215150S.WoodlawnAve.
Bloomington,IN474057104
{maktas,mnacar,fil}@indiana.edu
http://www.cs.indiana.edu
2
IndianaUniversity,CommunityGridsLabs
501N.MortonSt.Room222Bloomington,IN47404
http://www.communitygrids.iu.edu
3
IndianaUniversity,InformaticsSchool
901East10thStreetBloomington,IN474083912
http://www.informatics.indiana.edu

Abstract.Weintroduceatoolwhichisanapplicationofpersonalizedpage
rankvectorssuchaspersonalizedsearchengines.Weuseprecomputedpage
rank vectors to rank the search results in favor of user preferences. We
describe the design and architecture of our tool. By using precomputed
personalized pagerank vectors we generate search results biased to user
preferencessuchastopleveldomainandregionalpreferences.Weconducta
userstudytoevaluatesearchresultsofthreedifferentrankingmethodssuchas
similaritybased ranking, plain PageRank and weighted (personalized)
PageRank ranking methods. We discuss the results of our user study and
evaluatethebenefitsourpersonalizedPageRankvectorsinpersonalizedsearch
engines.

1Introduction
TheWebisahighlydistributedandheterogeneousinformationenvironment.The
immensenumberofdocumentsontheWebproducesvariouschallengesforsearch
engines.Storagespace,crawlingspeed,computationalspeedandretrievalofmost
relevant documents are some examples of these challenges. Intuitively, given a
query, most relevant documents can be considered as the most authoritative
documentsthatmatchwiththatquery.Recentinformationretrievaltechniques,such
asPageRank[1],[2]andHITS[3],putstogetherthetraditionalsimilaritymatching
retrievalmethodwithanotionofpopularityoflinks,basedonthehypertextstructure
oftheWeb.

PageRankalgorithmprovidesaglobalrankingofthewebpagesbasedontheir
importance([1],[4],[15]).Forinstance,alinkfromawebpageAtoanotherweb
pageBcanbeconsideredasifpageAisvotingfortheimportanceofpageB.
So, as the number of inlinks of page B gets increased, its importance gets
increasedaswell.PageRankalsoconsiderstheimportanceofinlinks.Notonlythe
numberofinlinksbutalsotheimportanceoftheseinlinksdecidesPageRankofa
page. In this scenario, global ranking of the pages is based on the webs graph
structure.Searchengines,suchasGoogle[8],utilizesthelinkstructureoftheWebto
calculatePageRankvaluesofthepages.Then,thesevaluesareusedtoreranksearch
results to improve precision. There are comprehensive surveys published in
explainingtheissuesrelatedwithPageRankin[5][6][7].
Theimportanceofwebpagesfordifferentuserscanbebetterdetermined,ifthe
PageRank algorithm takes into consideration user preferences. Personalized
PageRank approach was first introduced in[4] and studied by others [10][11] as
querydependentrankingmechanism.However,theuseandbenefitsofpersonalized
PageRankvectorshavenotbeenstudiedandexploredenoughinpersonalizedweb
searchapplications.
Inthispaper,weintroduceapersonalizedsearchenginethatutilizespersonalized
PageRank vectors. We study PageRank ranking method focusing particularly on
personalizedPageRankvectors.Wedefinethenotionofrelevancyofdocumentsasa
subjective metric which depends heavily onuser satisfaction. We emphasize that
preferencesshouldplayanimportantroleincalculatingthePageRankvalues.We
implementapersonalizedsearchengineasanapplicationofpersonalizedPageRank
vectors. We calculate PageRank vectors offline prior to search by taking into
considerationofthepersonalpreferencesoftheusers.Ouraimistofurtherimprove
theprecisionatlowrecall.Wedescribethedesignandarchitectureofourapplication
in details. We also explore the improvement we gain in precision by using
personalizedPageRankvectors.Inthenextsectionwetalkaboutmotivationsthatled
ustodothisresearch.
1.1.Motivations
PageRankalgorithm providesanobjectiveviewofthewebwhendecidingthe
globalimportanceofwebpages.However,suchobjectivitybringsvariousproblems.
Insomedomains,highlyrankedandauthoritativewebpagesmightbedistributed
intovariousregions,suchasEurope,AmericaandAsia.PlainPageRankalgorithm
[1][4]doesnottakeintoconsiderationtheregionalchoicesofusers.So,thesearch
resultsmightalsoincludesitesfromregionsthatausermighthavetheleastinterest.
However,auserwouldbemuchmoreinterestedinthesitesthatarehighlyranked
andthatareinthesameregionasuser.
An important consideration of the web users is the reliability of information
availableontheweb.Thewebhasademocraticstructure,inotherwords,majorityof
thewebsitesontheinternetarenotmonitoredfortheinformationthatarepublished.

Tothisend,itisimportanttochoosetheinformationsourcesthataremorelikelyto
bemonitoredbyexpertsfortheinformationaccuracyandthequality.Forinstance,a
usermightfavortothesitesthatareoneducationtopleveldomainsincetheytendto
reflectthepointofviewofhighlyeducatedcommunity.Likewise,anotherusermight
favorgovernmentpages,sincetheytendtobemonitoredmorestrictlycomparedto
otherdomainsforinformationaccuracy.
So,somewebpages,withmanyhighrankedinlinksfrom atoplevel domain
mightappearasiftheyarethemostqualifiedandauthoritativeinformationsources
insometopics,whereas theymaybenot verywell monitoredfortheinformation
accuracyandquality.Tothisend,searchenginesmightreturnpagesthatmightnot
giveinformationsatisfyinguserneedsandpreferences.
Theimportanceofapagediffersfordifferentindividualswithdifferentinterests,
knowledgeandbackground.So,aglobalrankingofawebpagemightnotnecessarily
indicatetheimportanceofthatpageforindividualusers.Itisimportanttocalculate
personalizedviewofimportanceofthepages.Inordertoovercometheseproblems,
weintroduce amethodwhichisbasedoncalculating the personalized PageRank
vectorspriortoquerytime.
Theoutlineofthepaperisasfollows.Firstwetalkabouttheuseofpersonalized
PageRankvectorsinourapplication.Second,weexplainthedesignandarchitecture
ofourpersonalizedsearchengine.Third,wediscusstheexperimentsthatweapplied
toexploretheimprovementswhenusingpersonalizedPageRankvectors.Fourth,we
presentanddiscussourresults.Atlast,wefinalizeourpaperwithaconclusion.
Inthefollowingsection,wediscusshowweusepersonalizedPageRankvectorsin
oursearchengineindetails.

2.PersonalizedPageRankVectors
Personalized PageRank vectors provide a ranking mechanism which in turn
createsapersonalizedviewofthewebforindividualusers.Anexampleapplication
of personalized PageRank vectors could be personalized search engines. In this
section,wediscusstheuseofpersonalizedPageRankvectorsinourimplementation
ofpersonalizedsearchengine.
ThecomputationofpersonalizedPageRankvectorsaredonepriortosearchtime.
When calculating the PageRank vectors, predefined user profiles are taken into
consideration.WeusefollowingequationsinFigure1inordertocalculatebothplain
andpersonalizedPageRankscores.
Definitions:

Prp A =PlainPageRankscoreofapageA.

Prw A =Weighted(personalized)PageRankscoreofapageA.
Ti = i thparentofapageA.

d =dumpingfactor
w(Ti ) =normalizedweightfactorcomputingbyapplyinglinksanalysison
parentpage Ti .

C Ti =numberofoutlinksofparentpage Ti .

Pr T1
Pr Tn

1
n

Prp A 1 d d

Pr Tn
Pr T1

Prw A 1 d d w(T1 )
w(Tn )
C T1
C Tn

Fig.1.PlainandweightedPageRankequationsusedinourexperiments
Inourdesign,auserprofileconsistsofsixdifferenttopleveldomainsandthree
different regions as choices of user preferences. The toplevel domains are
commercial(.com),military(.mil),government(.gov),organization(.org),business
(.net)andeducation(.edu)domains.Wealsointroducefollowingregionsaschoices
forregionalpreferences;Asia,AmericaandEurope.Tothisend,inordertocalculate
personalizedPageRankscores,wecalculated2 9=512differentcombinationsofuser
preferencechoices.So,weprecomputed512differentuserprofiles.
AnimportantfeatureourapproachisthatweapplylinkanalysistoaURLofa
webpagetocomputeitspersonalizedPageRankscore.AfterananalysisofitsURL,
awebpagemightbeclassifiedasifitbelongstoatopleveldomain,aregion,bothor
noneofthem.So,basedonthisanalysis,aURLwillgetaweightfactor.Thereare27
possible weight factors for each user profile. These weight factors can be
summarizedasfollows.AURLmightbelongtoonlyoneofthetopleveldomain
(outof6topleveldomains).AURLmightbelongtoonlyoneoftheregion(outof3
possibleregions).AURLmightbelongtobotharegionandatopleveldomainatthe
sametime(outof18differentcombinationsof(topleveldomain,region)pairs).The
valuesoftheseweight factorswill varyforeachuserprofile.Ifapage doesnot
qualifyforanyofthepossibleweightfactorcategory,thenplainPageRankscoreis
calculated.Wecanillustratethiswithfollowingexample.
Example: A government site belongs to United Kingdom:
http://www.direct.gov.uk
Apredefineduserprofile:
region:America,Europe

topleveldomain:government,education
Inthisexample,apersonalizedPageRankscoreforagovernmentsitebelongingto
UnitedKingdomcanbecomputedasfollowing.Weanalyzethegivenurlbylooking

atitstopleveldomainandcountryextension.Wesimplydothisbyexaminingits
anchortextandcomparetheresultofthisexaminationwithourdatabasewherewe
storeallpossibletopleveldomainabbreviationsaswellasabbreviationsforcountry
extensions.Sincewealreadyknowwhichcountrylocatedatwhichregion,byfinding
thecountryextention,wesimplylookupforthecorrespondingregion.
Inthisexample,http://www.direct.gov.ukbelongstoregionEuropeandtop
leveldomaingovernment.Sincethegivenurlhappenstohavebothtopleveldomain
andregionthatexistinthegivenuserprofile,itgetsthehighestnormalizedweight
factorwhichis1inthisexample.Table1showstheweight correlationforthe
exampleabove.
Table 1. Following table is a weight correlation table for predefined user profile with
followingpreferences.Regionpreferences:America,Europe,topleveldomainpreferences:
education,government.

combinationsoftoplevel
domainsandregions
education
europe
government
.
europe&
government
.

weightfactors
1
2
2
.
4
.

normalized
weightfactors
0.5
0.5
0.25
.
1
.

Auser isexpected toinput his/her choices of interests before thequery time.


Whenaqueryisposedbyauser,based onhis/heruserprofile,weretrieve the
correspondingpersonalizedPageRankvectorinordertorerankthehitstosatisfythe
query. We multiply the TFIDF based similarity score with PageRank scores.
Resultingscoresformthefinalrankingscoretorerankthehits.Bymultiplyingthe
similaritybasedmetricwithPageRankscoresweaimtoincreaseprecisionatlow
recallforourimplementationofpersonalizedsearchengine.
Inthefollowingsectionwediscussthedetailsofourdesignandarchitectureof
ourimplementationindetails.

3.DesignandArchitecture
WedesignedourpersonalizedsearchengineasaJavaprogramthatutilizesNutch
project[14]asamainsearchenginewhichuseTFIDFbasessimilaritymetric.The
implementationconsistsoftwocoreparts.
Firstpartoftheimplementationfocusoncalculationsthathappenpriortosearch
time.Inthispart,weprecomputeanumberofpersonalizedPageRankvectorsas
wellasaplainPageRankvector.PersonalizedPageRankvectorsarecomputedbased

onpredefineduserprofiles.Userprofilesincludechoicesofuserinterestsinregion
and/ortopleveldomainsofwebpages.PageRankvectorsarecomputedonlyonce
priortoquerytimeandstoredasdatafiles.Inordernottoincreasetheonlinequery
time,wealsoimplementedanextensionstoNutchprojectindexsystem,sothatitcan
also accommodate PageRank scores along with the existing information such as
anchortext,keywords,andsimilarityscore.ThisavoidstheheavyI/Ooverheadof
readingthePageRankresultsfromafilestoreoradatabase.Thecomputationofthe
PageRankvectorshappenafterthecreationofaconnectedwebgraph.Datastructure
ofthewebgraphisanimportantpartofourimplementation.Weusedcompressed
sparserow(CSR)datastructureforadjacencymatrixfordatarepresentation.CSR
datasturucturestoresitsrowandcolumnindexforeachentry.Entriesarelistedone
rowafteranother.Thisissimplydonebydefiningadatastructurewhichisatriplet
(i,j,value).Forthiswedefinedajavaobjecttorepresentatripletandaglobalarrayto
storethetripletobjects.Bydoingthis,wedontstorenonzerovaluesunnecessarily.
We used global parallel arrays for vertices and PageRank vectors. After
PageRankvectorsarecomputed,eachPageRankvectorisdumpedintoanoutputfile.
PriortocalculationofpersonalizedPageRankresults,wecreatedweightcorrelation
JavaObjectsforeachpossibleuserprofile.Inourdesign,wepresented9different
choiceofinterestintopleveldomainandregionsasmentionedbefore.Thisiswhy
wecreated29differentweightcorrelationobjectseachiscorrespondingtoadifferent
userprofile.Eachweightcorrelationobjectincludes27differentweightfactorfor
different user preference combinations of top level domains and regions. When
calculatingthePageRankofapagealinkanalysisisdone.Basedontheresultofthis
analysis,wedeterminewhichurlbelongstowhichtopleveldomainand/orregion.
Weusedahashtablewithintheweightcorrelationobjectstostoreuserpreference
combinations.AfterthePageRankresultsarecomputedandstoredinafilestore,we
runourextendedversionofNutchindexsystemtostorenewlycreatedPageRank
scoresinNutchindexdatabase.
Secondpartofourimplementationfocusononlinequeryprocessingandouruser
study. In thispart, we implemented various user interfaces by using Java Server
Pages.Whenaqueryismade,weuseNutchsearchermechanismtoretrievethehits.
NutchreturnsaTFIDFbasedsimilartyscoreforeachhit.Wereorderthehitsbased
onplainPageRankandpersonalizedPageRankscores.Weusethreeglobalarraysto
storetherankingscoresofthehitsthatbelongtothreedifferentrankingmechanism
such as similarity based, plain PageRank and personalized PageRank ranking
mechanisms.InordertoincludethesimilaritybasedmetricintoPageRankranking
methods, we calculate final plain and personal PageRank scores by multiplying
similaritybasedNutchscorewithprecalculatedPageRankvalues.
Sofar,wehavediscussedthedetailsofpersonalized PageRankvectorsinour
implementationanddesignandthearchitectureofourpersonalizedsearchengine.
Wediscussthedetailsoftheexperimentsandouruserstudyinthefollowingsection
indetails.

4Experiments
We conducted a user study to measure the performance of different ranking
methods such as similaritybased, plain PageRank and weighted PageRank
(personalized)rankingmethods.Inthisstudy,weaskedeachvolunteertouseour
personalizedsearchfacilityaftertheyinputtheiruserprofilesintooursystem.After
making a query, each volunteer was shown search results from three different
rankingmechanism.Foreachquery,top10resultsfromeachrankingmethodare
considered and these results are randomly shuffled before they were shown to
volunteers.Thereare5humansubjectswhocontributedtoouruserstudyintotal
with10queries.Werealizethatrecallandprecisionvaluesaredependentonwhether
thehumansubjectsofauserstudyareexperiencedsearchersornot.Anexperienced
searchermayeffecttherecallandprecision,eitherbyfindingeverythingonatopic
oronlyfewrelevantresults.Tothisend,weconductedouruserstudywithagroupof
graduatestudentsanddidnotgiveoutanyinformationaboutthemaingoalofthe
searchengine.VolunteersareonlyexpectedtoselectrelevantUrlsthatsatisfiestheir
choiceofpreferencesaswell.
Inorderbetterexplainourmethodintheuserstudy,wewouldliketoexplainit
with an example. Suppose that, Nutch search engine returns at least 10 results
satisfyingaquery.Then,theseresultsarererankedbasedontwootherPageRank
basedrankingmethods.Iftop10hitsfromthreedifferentrankingmechanismsturn
outtobetotallydifferentfromeachother,thenthevolunteerwasshown30hitsasa
resultofhis/herquery.Ifthehitsfromdifferentrankingmechanismsoverlapwith
eachother,thenthenumberofresultswillrangefrom10to30.
After the results are shown, each user was asked to select the URLs that are
relevanttohis/herqueryaswellassatisfyinghis/herchoiceofpreferences.Letauser
isshown30resultssatisfyinghis/herquery.Also,letthisuserselect 8outof30
resultsasrelevant.Inthiscase,foreachrankingmechanism,therecallisthedivision
ofthenumberofrelevantURLscomingfromcorrespondingrankingmechanismby
thetotalnumberofURLsdeemedtobeasrelevantwhichis8.Likewise,foreach
rankingmechanism,theprecisionisthedivisionoftherelevantURLscomingfrom
correspondingrankingmechanismbythenumberofretrievedresultswhichis10in
ourexample.
Tothisend,whenaqueryismade,wecalculate(precision,recall)pairsforeach
rankingmechanism.Also,theaverageofall(precision,recall)pairsarecalculated
forallgivenqueries.Thedefinitionsoftheparametersandthecalculationsusedin
calculatingthe(precision,recall)pairsareshownbelow.
Definitions:

R =totalsetoftherankingmechanisms

r =typeoftherankingmechanism

q =query
i =positionoftheURLinthecombinedhitlist,startingfromfirstURLinthelist

tothelast,fromtoptobottom.
retrieved (r , i, q ) =numberofallretrieveddocumentsforrankingmechanism
,andquery q
relevant ( r , i, q ) =numberofretrieveddocumentsthataredeemedasrelevantby
evaluatingtheURLs

precision ( r , i , q )

relevant ( r , i, q )
retrieved (r , i, q)

iQ

recall (r , i, q )

relevant ( r , i , q )
relevant (r , i, q )

r R iQ

Fig.2.Definitionofprecisionandrecallformulasthatweusedinouruserstudy
Fortheexperiments,weusedacrawldatathatwegainedbycrawlingtheweb
with the start point of Yahoo Directory [9] subcategories such as Education,
RegionandGovernmentinApril2004.Asaresult,ourdataconsistsof107890
Urlsand468410edgestoconnecttheminawebgraph.Thedynamicnatureandthe
growth of the web makes it difficult to calculate connectivity based ranking
mechanismscoressuchasPageRankrankingscore.WhencalculatingthePageRank
scores,theproblem ofdanglinklinks,nodesthatdonthaveknownoutlinks,was
explainedin[4].Also,ithasbeenshowedthatitispossibletocomputePageRank
scoresundertheeffect ofmissingoutlinkinformationandkeepingthePageRank
errorsundercontrolin[15].Inourexperience,inordertoavoidthehigherrorratein
PageRank calculations and increase the size of our data, we used an additional
imaginarynodetodistributethePageRankfromdanglinklinksbacktothegraph,by
connectingdanglinklinkswithsourcenodes,nodesthatdonthaveknowninlinks,
throughanimaginarynode.Weexperimentallyobservedthattheorderofthepages
changewithanegligibleerrorrate,whenusinganimaginarynodetodistributethe
rankfromdanglinklinksbacktothegraph.
Fortheuserstudy,wedesigned easytouseuserinterfaces. Weaimtoreduce
possiblewrongevaluationsthatmightbecausedbycomplicateduserinterfaces.Our
userstudyinterfaceshaveamodularstructureanditisflexibleformodifications.It
consistsofthreeparts.Inthefirstpart,usersprovidepersonalinformationsuchas
firstname,lastnameandchoicesofinterestsintopleveldomainsandregions.First
partofouruserinterfacesisillustratedinFigure4.Secondpartoftheuserstudyis
wherethepersonalizedwebsearchfacilityisdisplayed.Usersareexpectedtopose
theirqueriesbyusingthissearchfacility.Thethirdpartoftheuserstudyinterfacesis
wherethetophitsofthreedifferentrankingmechanismsaredisplayedtotheuser.In
thispart,wealso provide facilitieslike navigatingthe hitsandselecting relevant
pagesthatsatisfytheuserquery.Thethirdpartofouruserinterfaceisshownin
Figure5.

Fig.4.UserProfilePage.Userentershis/herfirsname,lastnameandchoicesofinterestsof
topleveldomainsand/orregiontocreateauserprofile.

Fig.5.PersonalizedWebSearchPage.Usermakesaquerybyusingpersonalizeswebsearch
facility. Search results coming from different ranking methods are shuffled. User enters
his/herchoicesregardingrelevantresultswithoutknowingwhichresultbelongwhichsearch
method.Userselectionsaresavedaftereachquery.

5Results
TheprecisionandrecallvaluesaresummarizedbytheplotinFigure6below.
Thesevaluesaregatheredbyaveraging(precision,recall)pairsofallqueriesposed
by the users. In order to emphasize the difference in precision of ranking
mechanisms,weappliedthelogarithmicinterpolationtoeachlineofthegraph.

Fig. 6. Comparison of precision and recall plot of three different ranking


mechanisms.
Themostsignificantconclusionwecandrawfromtheplotisthatpersonalized
PageRankvectorsprovidebetterprecisionthanothertworankingmechanisms.We
canalsoseethatbothPageRankbasedrankingmethodsoutperformssimilaritybased
rankingmethodbyprovidingbetterprecision.Basedontheseresults,wemanageto
provide personalized view of importance of web pages and a better ranking
mechanisminourpersonalizedwebsearchengine.

6RelevantWork
Personalizedsearchengineswerefirstintroducedasanapplicationofpersonalized
PageRank vectors in [4], however, newer explored. There are 2 n different
personalized PageRank vectors for all possible user preferences. This requires
enourmous amount of computation and storage facilities. In attempt tosolve this
problem, amethodwas introducedin[11]that computesonlylimitedamount of
PageRank vectors offline. This method suggests usage of partiallycomputed
PageRank vectors that are practical to compute and store offline. Based on this
method,personalizedPageRankvectorsforotherpossibleuserpreferencescanbe
computed fast at a query time. In our work, our main concern is to introduce a
prototypepersonalizedsearchengine,wherewecanapplyexperimentstofindout
theimprovement inprecisionwhenusingpersonalizedPageRankvectors.Tothis
end,welimittherangeoftheuserpreferencestotopleveldomainpreferencesand
regionalpreferences.Sincethetotalnumberofpreferencesetswerereasonableto

computeandstoreoffline,wedonotneedcalculatingthepersonalizedPageRank
vectorsatquerytime.
There has been extensive research done on calculating PageRank scores in
efficientwayswithvarioustechniques[16][17][18][19].Sinceourmainfocuswasto
utilize personalized PageRank vectors in personalized web search engines and
explore the improvements, we only implemented a prototype where we did not
expectittoscaleuptothesizeofthecurrentweb.Also,wedidnotanticipateto
calculatethepagerankvectorswithfrequentintervals.Thisiswhy,wedidnotuse
these efficient ways of calculating pagerank vectors in our implementation. The
computation timeoverhead for limited number of personalized PageRank vectors
wasmanageableforourpurposesofresearch.
Topicsensitive web search, introduced in [10], and the intelligent random
surfer, introduced in [13], are similar to our work. Both methods suggest pre
computationofpersonalizedPageRankvectorspriortoquerytimeandcalculationof
PageRankvectorsbasedonquerysimilarity.Whenaqueryismade,corresponding
personalizedPageRankvectorisselectedaccordingtothesimilaritybetweenquery
andthetopicofthePageRankvector.Ourworkissimilar,sincewealsoprecompute
personalizedPageRankvectorsandthenusethecorrespondingPageRankvectorto
reorderthehitsatquerytime.However,ourmaindifferenceisthatweperformlink
analysissuchastopleveldomainanalysisand/orreginalextentionanalysis,when
calculatingandselectingcorrespondingpersonalizedPageRankvectors.Ourmethod
doesnotrequirethefullcontentofapage,sinceweareonlyinterestedinanchortext
oftheURLswhendecidingontheweightcorrelations.

7Conclusions
Inthispaper,weintroducedatoolwhichisapersonalizedwebsearchengineas
an applicaton of personalized PageRank vectors. First, we discussed our use of
personalizedPageRankvectorsinourimplementation.Apartfromsimilarresearchin
thearea,wefocusedonthelinkanalysisoftheURLssuchastopleveldomainand
regional extension analysis, when computing the personalized PageRank vectors.
Then, we explained our design and architecture inimplementing the system. We
designedandconductedauserstudy.Intheuserstudy,weranexperimentsonareal
crawldatatoexploreimprovementsinprecisionwhenuserpreferencesaretakeninto
considerationinPageRankcalculation.Atlast,wepresentedtheresultsthatshows
improvementinprecisionwhenusingpersonalizedPageRankvectors.
We conclude that URLanalysisdependent personalized PageRank scores can
provide higher quality of search results and better precision at low recall. Inthe
futurework,weplantoexploreefficientwaysofcalculatingPageRankscoresin
ordertoenableourpersonalizedsearchenginescaleuptodynamicnatureandthe
growthoftheweb.

References
1.TheAnatomyofaLargeScaleHypertextualWebSearchEngine,SergeyBrinandLaw
rencePage,http://wwwdb.stanford.edu/~backrub/google.html
2.WhatcanyoudowithaWebinyourPocket,SergeyBrin,RajeevMotwani,LarryPage
andTerryWinograd,InBuletinoftheIEEEComputerSocietyTechnicalCommitteeonData
Engineering,1998
3.Authoritativesourcesinahyperlinkedenvironment,J.M.Kleinberg,Proceedingsofthe
NinthAnnualACMSIASymposiumonDiscreteAlgorithms,1998
4.ThePageRankCitationRanking:BringingOrdertotheWebbyLawrencePage,Sergey
Brin,RajeevMotwaniundTerryWinograd,Technicalreport,StanfordUniversityDatabase
Group,1998
5.Searchingtheweb,ArvindArasu,JunghooCho,HectorGarciaMolina,AndreasPaep
cke,andSriramRaghavan.ACMTransactionsonInternetTechnology,2001.
6.AsurveyofeigenvectormethodsofwebinformationretrievalAmyN.Langvilleand
CarlD.Meyer.TheSIAMReview,2003.AcceptedinDecember2003.
7.Deeperinsidepagerank,AmyN.LangvilleandCarlD.Meyer.InternetMathematics
Journal,2004.AcceptedinFebruary2004.
8.GoogleWebSite:http://www.google.com
9.YahooDirectoryWebSite:http://dir.yahoo.com
10.TopicsensitivePageRank,T.H.Haveliwala.InProceedingsoftheEleventhInterna
tionalWorldWideWebConference,Honolulu,Hawaii,May2002.
11.ScalingpersonalizedwebsearchG.JehandJ.Widom,Technicalreport,StanfordUniver
sityDatabaseGroup,2002
12.AnAnalyticalComparisonofApproachestoPersonalizingPageRank,HaveliwalaT.,
KamvarS.,JehG.StanfordUniversityTechnicalReport,2003 .
13.Theintelligentsurfer:Probabilisticcombinationoflinkandcontentinformationin
PageRank,M.RichardsonandP.Domingos,InProceedingsofAdvancesinNeural
InformationProcessingSystems14,Cambridge,Massachusetts,December2002
14.NutchOpenSourceSearchEngineSite:http://www.nutch.org
15.OutlinkEstimationforPagerankComputationunderMissingData,AcharyyaS.,Ghosh
J.,TheThirteenthInternationalWorldWideWebConference,N.Y.2004
16. Efficient computation of PageRank, Taher H. Haveliwala, Stanford University
Technical
Report,1999
17.ExploitingtheblockstructureofthewebforcomputingPageRank.KamvarS.D.,
HaveliwalaT.H.,ManningC.D.,andGolubG.H..StanfordUniversityTechnicalReport,
2003.
18. Extrapolation Methods for Accelerating PageRank Computations, Kamvar S.,
Haveliwala
T.,ManningC.,GolubG.,ProceedingsoftheTwelfthInternationalWorldWideWebConfer
ence,2003.
19.AdaptiveMethodsfortheComputationofPageRank,KamvarS.,HaveliwalaT.,and
GolubG.,Technicalreport,StanfordUniversity,April2003.

Вам также может понравиться