Академический Документы
Профессиональный Документы
Культура Документы
alliendalkachew@gmail.com
page) retrieval in
response to a query
Quite effective (at
some things)
Commercially
successful (some of
them)
Examples of IR systems
Conventional (library catalog): Search by keyword,
title, author, etc.
Other:
Cross language information retrieval,
Music retrieval
Information Retrieval
Information retrieval (IR) is the process of
of) documents
Large collections of documents from various
document collection.
Retrieval systems, such as Google, Yahoo,
Data retrieval
Information retrieval
Data Retrieval
Info Retrieval
Data organization
Structured
Unstructured
Fields
Clear Semantics
No fields (other
(ID, Name, age,) than text and images etc)
Matching
Items wanted
Matching
100%
Sensitive
Relevant
< 50%
Insensitive
Accuracy
Error response
Why is IR so hard?
Traditionnel Information retrieval (IR) Systems
I saw her duck away from the ball falling from the sky
a Lincoln
A word matching query for Abraham Lincoln may well
find the above document.
Retrieval
DB
Browsing
USER
Browsing
Tokenization
Full
text
stop words
stemming
Indexing
Index terms
Structure of an IR System
An Information Retrieval System serves as a bridge
Structure of an IR System
To be effective in its attempt to satisfy
Structure of an IR System
Typical IR Task
Given:
A corpus of textual
natural-language
documents.
A user query in
the form of a
textual string.
Document
corpus
Find:
A ranked set of
documents that
are relevant to the
query.
Quer
y
Strin
g
IR
System
Ranked
Documents
1. Doc1
2. Doc2
3. Doc3
.
.
Web
Spider
Document
corpus
Query
String
IR
System
1. Page1
2. Page2
3. Page3
.
.
Ranked
Documents
The
The Retrieval
Retrieval Process
Process
It is necessary to define the text database
Retrieval
Retrieval Process
Process .
.
Once the logical view of the documents is
The Retrieval
The user then examines the set of ranked
documents in the search
Process
for useful information. Two
Text
User
Interface
Text
Text Operations
logical view
Query Language
& Operations
User
feedback
Query
Searching
Retrieved docs
Ranked docs
Ranking
Logical view
Indexing
DB
manager
Module
Inverted file
Index
Text
Database
supported?
documents)
used?
what is a good model of retrieval?
the system
of precision, recall,
Stemming, stop words, weighting schemes,
matching algorithms
Subsystems of an IR system
The two subsystems of an IR system:
Searching: is an online process of finding relevant
documents in the index list as per users query
Indexing: is an offline process of organizing
documents using keywords extracted from the
collection
Indexing and searching: are unavoidably
connected
searchable
to index one needs an indexing language
there
documents
Documents
text
Tokenize
tokens
Stop list
non-stoplist
Stemming & Normalize
tokens
stemmed
Term weighting
terms
terms with
weights
Index
Index