Вы находитесь на странице: 1из 25

BY

BHARAT SHANDILYA
ROLL NO. : 6010902
BACHELOR OF SCIENCE HONOURS COMPUTER SCIENCE
BHASKARACHARYA COLLEGE OF APPLIED SCIENCES
INTRODUCTION
 PAGE RANK IS A METHOD OF MEASURING A PAGE’S
IMPORTANCE.
 ACCORDING TO THEIR PAGERANK PAGES ARE PREFFERED IN A
SEARCH.
 THIS IS THE GOOGLE’S METHOD OF RANKING THE PAGES
WHICH WAS NAMED AFTER LARRY PAGE GOOGLE’S CO-
FOUNDER .
 OTHER ALGORITHMS ALSO EXISTS
1. HITS
2. WEIGHTED PAGE RANK
1. FIND ALL PAGES MATCHING THE
KEYWORD OF THE SEARCH
2. RANK ACCORDING TO THE
KEYWORDS
3. CALCULATE THE PAGERANKS OF THE
PAGES
4. ADJUST THE RESULT ACCORDING TO
THE PAGERANK SCORES
HOW IS PAGERANK DETERMINED
FEW THINGS THAT NEED TO BE KNOWN ABOUT GOOGLE
THEORY

IF PAGE A LINKS TO PAGE B ,THEN PAGE A IS SAYING THAT


PAGE B IS AN IMPORTANT PAGE.
IF A PAGE HAS MORE IMPORTANT LINKS TO IT THEN ITS
LINKS TO OTHER PAGES ALSO BECOMES MORE
IMPORTANT.
DANGLING LINKS (PAGES) ARE NOT INCLUDED IN THE
CALCULATION OF PAGERANKS.
WHEN A PAGE HAS SEVERAL LINKS TO ANOTHER PAGE , IT
IS COUNTED AS ONE LINK.
WHEN A PAGE LINKS TO ITSELF ,THE LINK IS NOT
COUNTED.
FORMULA FOR PAGERANK

PR(A)=(1-d) + d(PR(t1)/C(t1)+……+PR(tn)/C(tn))

Here :
PR(A) = PAGERANK OF PAGE A
t1,t2,….tn = PAGES THAT LINK TO PAGE A
d = DAMPING FACTOR USUALLY SET TO 0.85
C(ti) = THE NUMBER OF LINKS OFF THAT PAGE
EXAMPLE
STRUCTURE OF FOUR PAGES LINKED
TOGETHER

PAGE A PAGE B

PAGE C PAGE D

TO GET NEAR ACCURATE VALUES OF PAGERANK IT IS


ITERATIVELY CALCULATED MANY TIMES.
TO START WITH WE GIVE EACH PAGE A PAGERANK OF ONE
STARTING WITH A AND USING THE FORMULA , WE GET THE
PAGERANKS AS
PAGE A PAGE B
1 1

PAGE C PAGE D
1 1

PR(A) = 0.15 + 0.85(1) = 1


PR(B) = 0.15 + 0.85(1/2) = 0.575
PR(C) = 0.15 + 0.85(1/2 + 1 + 1) = 2.275
PR(D) = 1
SECOND CYCLE CALCULATION
(ITERATIVE )
PAGE A PAGE B
1 0.575

PAGE C PAGE D
2.275 1
ITERATIVE PR’S

PR(A) = 0.15 + 0.85(2.275) = 2.08375


PR(B) = 0.15 + 0.85(1/2) = 0.575
PR(C) = 0.15 + 0.85(1/2 + 0.575 + 1) = 1.91375
PR(D) = 1
THIRD CYCLE PAGERANK CALCULATION

PAGE A PAGE B
2.08375 0.575

PAGE C PAGE D
1.91375 1
ITERATIVE PR’S
PR(A) = 0.15 + 0.85(1.91375) = 1.7766875
PR(B) = 0.15 + 0.85(2.08375/2) = 1.03559375
PR(C) = 0.15 + 0.85(2.08375/2 + 0.575 + 1) = 2.37434375
PR(D) = 1

FURTHER ITERATIVE CALCULATIONS WILL GIVE MORE ACCURATE RESULTS.


COMPARING WE SEE THAT PAGE C HAS HIGHEST PAGERANK , SINCE ALL OTHER
PAGES LINK TO THIS PAGE. ALSO A HAS SECOND HIGHEST PAGERANK , SINCE C
LINKS TO PAGE A AND MAKES PAGE A ALSO IMPORTANT.
EFFECT OF STRUCTURE ON PAGE RANK
HERE ARE THREE TYPES OF STRUCTURE WE WILL
CONSIDER HERE

HIERARCHICAL

LOOPING

EXTENSIVE INTERLINKING
ABOUT
US 1

HOME ABOUT
PAGE 1 US 2

MORE
INFO 1
HOME ABOUT
PAGE 1 US 1

MORE ABOUT
ANFO 1 US 2
HOME ABOUT
PAGE 1 US 1

MORE ABOUT
INFO 1 US 2
HIERARCHICAL LOOPING EXTENSIVE
INTERLINKING

HOME 939.1766 469.5883 469.5883


PAGE 1
ABOUT 313.0588 469.5883 469.5883
US 1
ABOUT 313.0588 469.5883 469.5883
US 2
MORE 313.0588 469.5883 469.5883
INFO 1
WEIGHTED PAGERANK ALGORITHM IS AN EXTENTION
TO THE STANDARD PAGERANK ALGORITHM. WPR TAKES
INTO ACCOUNT THE IMPORTANCE OF BOTH THE
INLINKS AND OUTLINKS OF THE PAGES AND
DISTRIBUTES RANK SCORES BASED ON THE POPULARITY
OF THE PAGES. RESULTS SHOW THAT WPR PERFORMS
BETTER THAN THE CONVENTIONAL PAGERANK
ALGORITHM IN TERMS OF RETURNING LARGER NUMBER
OF RELEVANT PAGES TO A GIVEN QUERY.
WHEN A SEARCH IS MADE FOR A QUERY SOME PAGES
IRRELEVANT TO A GIVEN QUERY ARE INCLUDED IN THE
RESULT AS WELL. TO OVERCOME THIS WE CATEGORIZE
THE PAGES INTO FOUR CLASSES BASED ON THEIR
RELEVANCY TO THE GIVEN QUERY:

1. VERY RELEVANT PAGES ( VR )


2. RELEVANT PAGES ( R )
3. WEAK – RELEVANT PAGES ( WR )
4. IRRELEVANT PAGES ( IR )
RELEVANCY RULE : THE RELEVANCY OF A PAGE TO A
GIVEN QUERY DEPENDS ON ITS CATEGORY AND ITS
POSITION. THE LARGER THE RELEVANCY VALUE IS , THE
BETTER IS THE RESULT. THE RELEVANCY к , OF A PAGE IS A
FUNCTION OF ITS CATEGORY AND POSITION
К = ∑i€R(p)(n-i)*Wi
Wi = V1 , V2 , V3 , V4
V1 = If ith page is VR
V2 = If ith page is R
V3 = If ith page is WR
V4 = If ith page is IR
Where V1 > V2 > V3 > V4
NUMBER OF RELEVANT RELEVANCY VALUE (K)
PAGES

SIZE OF PAGERANK WPR PAGERANK WPR


PAGE
10 0 1 0.1 0.5
20 4 3 13.1 16.8
30 4 4 47.1 49.8
40 4 4 82.1 84.8
50 4 4 117.1 119.8
60 5 5 159.6 162.3
70 7 7 211.7 214.4
NUMBER OF RELEVANT RELEVANCY VALUE (K)
PAGES
SIZE OF PAGERANK WPR PAGERANK WPR
PAGE

5 2 3 2 5.5
10 2 4 9.5 22
20 4 4 34.5 57
30 8 5 87.5 99
40 10 8 158.5 159.3
80 16 15 624.8 655.3
100 22 19 999.2 1045.3

• THE TABLE SHOW THAT MORE RELEVANT PAGES ARE SHOWN BY WPR

THAN THE PAGERANK


• THE TABLE SHOW THAT THE RELEVANT PAGES DETERMINED BY WPR
ARE EITHER MORE RELEVANT OR RANKED HIGHER INSIDE THE LIST.
INTERNAL LINKING
A WEBSITE HAS A MAXIMUM AMOUNT OF
PAGERANK THAT IS DISTRIBUTED BETWEEN ITS
PAGES BY INTERNAL LINKS.
THE MAXIMUM AMOUNT OF PAGERANK IN A SITE
INCREASES AS THE NUMBER OF PAGES IN THE SITE
INCREASES.
BY LINKING POORLY IT IS POSSIBLE TO FAIL TO
REACH THE SITE’S MAXIMUM PAGERANK , BUT IT IS
NOT POSSIBLE TO EXCEED IT.
TYPES OF LINKS
DANGLING LINKS

INBOUND LINKS

OUTBOUND LINKS
• DANGLING LINKS ARE SIMPLY LINKS THAT POINT TO ANY PAGE WITH
NO OUTGOING LINKS.
• AS THEY CAN NOT FORWARD THEIR PAGERANK THEY ACCUMULATE
THEM. FOR THIS , THEY ARE NOT INCLUDED IN THE CALCULATION OF

PAGERANK.
• GOOGLE REMOVES THESE WHEN THEY START CALCULATING
PAGERANK AND ADD THEM BACK WHEN CALCULATION IS OVER.

HERE PAGE D IS A DANGLING LINK

PAGE
B
PAGE
PAGE D
A

PAGE
C
• INBOUND LINKS ARE LINKS INTO THE SITE FROM THE OUTSIDE.

• THESE ARE ONE WAY TO INCREASE A SITE’S TOTAL PAGERANK.

• INBOUND LINKS INCREASES THE PAGERANK OF A SITE

• THE LINKING PAGE’S PAGERANK IS IMPORTANT , BUT SO IS THE


NUMBER OF LINKS GOING FROM THAT PAGE. FOR INSTANCE , IF
YOU ARE THE ONLY LINK FROM A PAGE YOU WILL RECEIVE AN
INJECTION OF 0.15 + 0.85(2/1) = 1.85 INTO YOUR SITE , WHEREAS A
LINK WITH 99 OTHER LINKS WILL INCREASE YOUR SITE BY
0.15 + 0.85(2/100) = 0.167.
• OUTBOUND LINKS ARE LINKS ONTO THE OTHER SITES FROM
THE INSIDE.

• OUTBOUND LINKS TEND TO DECREASE OR LEAKE THE PAGERANK OF A


SITE.

• THE PAGE THAT ONE LINKS PUT FROM MAKES A DIFFERENCE TO WHICH
PAGES SUFFER THE MOST LOSS.

• PAGERANK IS LEAKED ONLY WHEN GOOGLE RECOGNIZES A LINK TO


ANOTHER SITE.
• INTRODUCES THE PAGERANK ALGORITHM AND WPR AN
EXTENSION TO THE PAGERANK.

• EFFECT OF STRUCTURES ON PAGERANKS.

• TYPES OF LINKS IN THE NETWORK.

• WPR IS ABLE TO IDENTIFY A LARGER NUMBER OF RELEVANT

PAGES TO A GIVEN QUERY.

• RELEVANCIES OF THE SEARCHED PAGES.

Вам также может понравиться