United States Patent 11»
Li
Us
(s4]
[75] Inventor: Yanhong Li, Scotch Plains, NJ.
[73] Assignce: IDD Enterprises, LP, Now York, N.Y.
[21] Appl. No.: 08794 428
[22] Filed: Feb. 8, 1997
[Si] Int. cL oor 17/30
[52] US.cl ‘T/S; 707/10; 707/501
[58] Field of Search 7072, 4, 5,10,
T0701
[56] References Cited
USS. PATENT DOCUMENTS
53408585 4/1905 Oren al 395/600
SHIS988 5/1905 Tonle 395/600
Sio801 $1995 Kaplan ei 395/600
53488725 1/1990 Tonle ca 7075
SI835905 11/1998 Proll et al 7018
OTHER PUBLICATIONS
‘Yuwono et al. “Search and Ranking Algorithms for Locating
Resources on the World Wide Web’
1986,
Cheong, Fah-Chun, Internet Agents:
Brokers and Bots, Chapter 4, Oct. 1995.
IEEE, pp. 164-171
Spiders, Wanderers,
Croft et al,“A Retrieval Model for Incorporating Hypertext
5,920,859
Jul. 6, 1999
005920859
(1) Patent Number:
[45] Date of Patent:
Bichtcler et al.,“‘The Combined Use of Bibliographic Cou-
pling and Cocitation for Document Retrieval,” Journal of
the American Society for Information Science, pp. 278-282
(ul. 1986).
Dunlop et al,
mation Processing & Managment, vol.
287-298 (1993).
Froi et al, “The Use of Semantic Links in Hypertext
Information Retrieval,” Information Processing & Manage-
‘ment, vol. 31, No. 1, pp. 1-13 (1995).
Hypermedia and Free Text Retrieval,” Infor-
29, No. 3, pp.
Primary Examiner—Dhomas G. Black
Assistant Examiner—Joba C. Loomis
Attorney, Agent, or Firm—Marshall, O'Toole, Gerstein,
Murray & Borus
(57) ABSTRACT
‘A search engine for retrieving documents pertinent 10 a
{query indexes documents in accordance with hyperlinks
pointing 10 those documents, The indexer traverses. the
hhypertext database and finds hypertext information includ-
ing the address of the document the hyperlinks point to and
the anchor text of each hyperlink. ‘The information is stored
in an inverted index file, which may also be used to calculate
document link vectors for each hyperlink pointing to a
particular document. When a query is entered, the search
‘engine finds all document vectors for documents baving the
query terms in their anchor text. A query vector is also
caleulated, and the dot product of the query vector and each
document link vector is calculated. The dot products relating
to a particular document are summed to determine the
relevance ranking for each document,
25 Claims, 6 Drawing Sheets
Links,” Hypertext °89 Proceeding, pp. 213-224, Nov. 1989.
Harman, Donna, “Ranking Algorithms,” Information
Resrieval, Chapter 14, pp. 363-371, 1992.
7 1208
Input user query
‘Search inverted fle
—
1 java tutorial
Find daoumens laid to
wery
q
Find document tink vectors
Calculate relovence score
ae
Sum relevance scores
J
¥
Doc. B _tPaA
14-4"
Se
om a
ava, tora |
as |
1268
p=
6, tutorial, on java>
it, Os
05,5
Doe. D:
4,08, 1
(Output score result
~g] Doe. 1620
Dee. 0149)
“13285,920,859
Jul. 6, 1999 Sheet 1 of 6
US. PatentUS. Patent Jul. 6, 1999 Sheet 2 of 6 5,920,859
34:
32
USER QUERY SEARCH RESULT
7)
|
t
USER INTERFACE
ca
RETRIEVAL ENGINE
[36
lf
LINK FILE DOC. VECTOR FILE} | INVERTED FILE
a INDEX FiLes 48 a
\
38
INDEX ENGINE |
¥
¥
DOCUMENT DATABASE (WEB)
Naa
FIG, 2