Retrieval Evaluation

By
S.DHIVYA(2014614007)
M.E CSE(I YEAR)
03/24/15
Time and space

The
shorter the response time, the smaller the space used, the
better the system is
Performance evaluation (for data retrieval)

Performance of the indexing structure
The interaction with the operating systems
The delays in communication channels
The overheads introduced by software layers
Performance evaluation (for information retrieval)

Besides
time and space, retrieval performance is an issue
03/24/15
Factors of Retrieval performance evaluation

1) Test reference collection
* collection of documents
* set of example information requests
* set of relevant documents(provided by
specialist) for each example information
request
03/24/15
2)Evaluation measure-Goodness of retrieval

strategy
- Similarity between set of documents
retrieved by S and set of relevant documents
provided by specialists
03/24/15
Retrieval evaluation measures

1) RECALL
2) PRECISION
Alternative
1) E-Measure
2)Harmonic mean
3) Satisfaction
4) Frustration
03/24/15
Retrieval task-query processing

Batch mode
The user submits a query and receives an answer back
How the answer set is generated-only quality of
answer is generated.
Interactive mode
The user specifies his information need through a

series of interactive steps with the system
Aspects
User
effort
characteristics of interface design
guidance provided by the system
duration of the session
03/24/15
Recall
the
fraction of the relevant documents which has

been retrieved | Ra |
Precision
|R|
the
fraction of the retrieved documents which is

relevant
| Ra |
| A|
collection
Relevant Docs
in Answer Set
|Ra|
03/24/15
Relevant Docs
|R|
Answer Set
|A|
7
The user is not usually presented with all the

documents in the answer set A at once
Example
Rq={d3,d5,d9,d25,d39,d44,d56,d71,d89,d123}
(100%,10%)
Ranking for query q by a retrieval algorithm

1. d123 6. d9
11. d38
2. d84
7. d511
12. d48
3. d56 8. d129
13. d250
4. d6
9. d187
14. d113
5. d8
10. d25 15. d3
(66%,20%)
03/24/15
(precision, recall)
(50%,30%)
(40%,40%)
(33%,50%)
8
precision versus recall based on 11 standard

recall levels: 0%, 10%, 20%, , 100%
p
r
e
c
i
s
i
o
n
interpolation
120
100
80
60
40
20
03/24/15
20
40
60
recall
80
120
100
9
average the precision figures at each recall level

Nq P ( r )
i
P(r )
i 1
Nq
P(r): the average precision at the recall level r

Nq: the number of queries used
Pi(r): the precision at recall level r for the i-th
query
03/24/15
10
Rq={d3,d56,d129}
1. d123
2. d84
3. d56
4. d6
5. d8
(33.3%,33.3%)
03/24/15
6. d9
7. d511
8. d129
9. d187
10. d25
(25%,66.6%)
11.
12.
13.
14.
15.
d38 (precision, recall)

d48
d250
d113
d3
(20%,100%)
11
rj (j {0,1,2,,10}): a reference to the j-th standard

recall level (e.g., r5 references to the recall level 50%)
P(rj)=max rjrrj+1P(r)
Example
d56 (33.3%,33.3%)
d129 (25%,66.6%)
d3 (20%,100%)
r0: (33.33%,0%)
r3: (33.33%,30%)
r6: (25%,60%)
r9: (20%,90%)
r1: (33.33%,10%)
r4: (25%,40%)
r7: (20%,70%)
r10: (20%,100%)
r2: (33.33%,20%)
r5: (25%,50%)
r8: (20%,80%)
interpolated precision
03/24/15
12
p
r
e
c
i
s
i
o
n
The curve of precision versus recall which

results from averaging the results for various
queries
100
90
80
70
60
50
40
30
20
10
0
03/24/15
20
40
60
recall
80
100
13
120
Average precision at seen relevant documents

Generate
a single value summary of the ranking by

averaging the precision figures obtained after each
new relevant document is observed
Example
1. d123 (1)
6. d9
(0.5) 11. d38
2. d84
7. d511
12. d48
3. d56 (0.66) 8. d129
13. d250
4. d6
9. d187
14. d113
5. d8
10. d25 (0.4)15. d3 (0.33)
(1+0.66+0.5+0.4+0.33)/5=0.57
03/24/15
Favor systems which retrieve relevant documents

quickly
15
Reciprocal Rank (RR)

Equals
to precision at the 1st retrieved relevant

document
Useful for tasks need only 1 relevant document
ex: Question & Answering
Mean Reciprocal Rank (MRR)

The
mean of RR over several queries
03/24/15
15
R-Precision
Generate
a single value summary of ranking by

computing the precision at the R-th position in
the ranking, where R is the total number of
relevant documents for the current query
1. d123
6. d9
2. d84
7. d511
3. d56
8. d129
4. d6
9. d187
5. d8
10. d25
R=10 and # relevant=4
R-precision=4/10=0.4
03/24/15
2.
1.
2.
d123
d84
3.
56
R=3 and # relevant=1

R-precision=1/3=0.33
16
Precision Histograms
A R-precision
Compare the
graph for several queries

retrieval history of two algorithms
RPA / B (i ) RPA (i ) RPB (i )

where RPA (i ) and RPB (i ) are R precision values of
retrieval a lg orithms A and B for the i th query
RPA/B=0: both algorithms have equivalent
performance for the i-the query

RPA/B>0: A has better retrieval performance for
query i
RPA/B<0: B has better retrieval performance for
query i
03/24/15
17
1.5
1.0
0.5
0.0
-0.5
-1.0
-1.5
2
Query Number
03/24/15
18
10
Statistical summary regarding the set of

all the queries in a retrieval task
the
number of queries used in the task

the total number of documents retrieved by all
queries
the total number of relevant documents which
were effectively retrieved when all queries are
considered
the total number of relevant documents which
could have been retrieved by all queries
03/24/15
19
Estimation of maximal recall requires

knowledge of all the documents in the
collection
Recall and precision capture different
aspects of the set of retrieved documents
Recall and precision measure the
effectiveness over queries in batch mode
Recall and precision are defined under the
enforcement of linear ordering of the
retrieved documents
03/24/15
20
harmonic mean F(j) of recall and precision

F ( j)
1
1
R( j ) P( j )
R(j): the recall for the j-th document in the
ranking
P(j): the precision for the j-th document in
the ranking
2 P R
F
PR
03/24/15
21
1. d123
2. d84
3. d56
4. d6
5. d8
(33.3%,33.3%)
F (3)
03/24/15
2
1
1
0.33 0.33
6. d9
7. d511
8. d129
9. d187
10. d25
(25%,66.6%)
0.33 F (8)
11. d38
12. d48
13. d250
14. d113
15. d3
(20%,100%)
2
1
1
0.25 0.67
0.36 F (15)
22
2
1
1
0.20 1
0.33
E evaluation measure
Allow
the user to specify whether he is more

interested in recall or precision
E( j) 1
1 b2
b2
1
R( j ) P( j )
( 1) P R
F
2
PR
2
03/24/15
23
Basic assumption of previous evaluation

The
set of relevant documents for a query is the

same, independent of the user
User-oriented measures
coverage
ratio
novelty ratio
relative recall
recall effort
03/24/15
24
| Rk |
cov erage
|U |
high coverage ratio: system finds most of the relevant

documents the user expected to see
| Ru |
high novelty ratio: the system reveals many new
novelty
relevant documents which were
| Ru | | Rk |
previously unknown
Relevant Docs |R|
relative recall=
| Rk | | Ru |
|U |
Answer Set |A| (proposed by system)
recall effort:
# of relevant docs
the user expected
to find/# of docs
examined to find
the expected
relevant docs
Relevant Docs
known to the user |U|
Relevant Docs
known to the User
which were retrieved |Rk|
03/24/15
Relevant Docs
previously unknown to the
user which were retrieved |Ru|
26

Retrieval Evaluation

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Retrieval Evaluation

Загружено:

Авторское право:

Доступные форматы

By

Time and space

Performance evaluation (for data retrieval)

Performance evaluation (for information retrieval)

time and space, retrieval performance is an issue

Factors of Retrieval performance evaluation

2)Evaluation measure-Goodness of retrieval

Retrieval evaluation measures

Retrieval task-query processing

The user specifies his information need through a

fraction of the relevant documents which has

fraction of the retrieved documents which is

The user is not usually presented with all the

Ranking for query q by a retrieval algorithm

precision versus recall based on 11 standard

average the precision figures at each recall level

P(r): the average precision at the recall level r

d38 (precision, recall)

rj (j {0,1,2,,10}): a reference to the j-th standard

The curve of precision versus recall which

Average precision at seen relevant documents

a single value summary of the ranking by

Favor systems which retrieve relevant documents

Reciprocal Rank (RR)

to precision at the 1st retrieved relevant

Mean Reciprocal Rank (MRR)

mean of RR over several queries

a single value summary of ranking by

R=3 and # relevant=1

graph for several queries

RPA / B (i ) RPA (i ) RPB (i )

performance for the i-the query

Statistical summary regarding the set of

number of queries used in the task

Estimation of maximal recall requires

harmonic mean F(j) of recall and precision

the user to specify whether he is more

Basic assumption of previous evaluation

set of relevant documents for a query is the

high coverage ratio: system finds most of the relevant

Relevant Docs |R|

Answer Set |A| (proposed by system)

Вам также может понравиться