Вы находитесь на странице: 1из 25

By

S.DHIVYA(2014614007)
M.E CSE(I YEAR)

03/24/15

Time and space


The

shorter the response time, the smaller the space used, the
better the system is

Performance evaluation (for data retrieval)


Performance of the indexing structure
The interaction with the operating systems
The delays in communication channels
The overheads introduced by software layers

Performance evaluation (for information retrieval)


Besides

time and space, retrieval performance is an issue

03/24/15

Factors of Retrieval performance evaluation


1) Test reference collection
* collection of documents
* set of example information requests
* set of relevant documents(provided by
specialist) for each example information
request

03/24/15

2)Evaluation measure-Goodness of retrieval


strategy
- Similarity between set of documents
retrieved by S and set of relevant documents
provided by specialists

03/24/15

Retrieval evaluation measures


1) RECALL
2) PRECISION
Alternative
1) E-Measure
2)Harmonic mean
3) Satisfaction
4) Frustration

03/24/15

Retrieval task-query processing


Batch mode
The user submits a query and receives an answer back
How the answer set is generated-only quality of
answer is generated.
Interactive mode

The user specifies his information need through a


series of interactive steps with the system
Aspects
User

effort
characteristics of interface design
guidance provided by the system
duration of the session

03/24/15

Recall
the

fraction of the relevant documents which has


been retrieved | Ra |
Precision
|R|
the

fraction of the retrieved documents which is


relevant
| Ra |
| A|
collection
Relevant Docs
in Answer Set
|Ra|

03/24/15

Relevant Docs
|R|

Answer Set
|A|
7

The user is not usually presented with all the


documents in the answer set A at once
Example
Rq={d3,d5,d9,d25,d39,d44,d56,d71,d89,d123}

(100%,10%)

Ranking for query q by a retrieval algorithm


1. d123 6. d9
11. d38
2. d84
7. d511
12. d48
3. d56 8. d129
13. d250
4. d6
9. d187
14. d113
5. d8
10. d25 15. d3

(66%,20%)
03/24/15

(precision, recall)

(50%,30%)

(40%,40%)

(33%,50%)
8

precision versus recall based on 11 standard


recall levels: 0%, 10%, 20%, , 100%

p
r
e
c
i
s
i
o
n

interpolation
120
100
80
60
40
20

03/24/15

20

40

60
recall

80

120

100
9

average the precision figures at each recall level


Nq P ( r )
i

P(r )

i 1

Nq

P(r): the average precision at the recall level r


Nq: the number of queries used
Pi(r): the precision at recall level r for the i-th
query

03/24/15

10

Rq={d3,d56,d129}
1. d123
2. d84
3. d56
4. d6
5. d8

(33.3%,33.3%)

03/24/15

6. d9
7. d511
8. d129
9. d187
10. d25

(25%,66.6%)

11.
12.
13.
14.
15.

d38 (precision, recall)


d48
d250
d113
d3

(20%,100%)

11

rj (j {0,1,2,,10}): a reference to the j-th standard


recall level (e.g., r5 references to the recall level 50%)
P(rj)=max rjrrj+1P(r)
Example
d56 (33.3%,33.3%)
d129 (25%,66.6%)
d3 (20%,100%)

r0: (33.33%,0%)
r3: (33.33%,30%)
r6: (25%,60%)
r9: (20%,90%)

r1: (33.33%,10%)
r4: (25%,40%)
r7: (20%,70%)
r10: (20%,100%)

r2: (33.33%,20%)
r5: (25%,50%)
r8: (20%,80%)

interpolated precision
03/24/15

12

p
r
e
c
i
s
i
o
n

The curve of precision versus recall which


results from averaging the results for various
queries
100
90
80
70
60
50
40
30
20
10
0
03/24/15

20

40

60
recall

80

100
13

120

Average precision at seen relevant documents


Generate

a single value summary of the ranking by


averaging the precision figures obtained after each
new relevant document is observed
Example
1. d123 (1)
6. d9
(0.5) 11. d38
2. d84
7. d511
12. d48
3. d56 (0.66) 8. d129
13. d250
4. d6
9. d187
14. d113
5. d8
10. d25 (0.4)15. d3 (0.33)
(1+0.66+0.5+0.4+0.33)/5=0.57

03/24/15

Favor systems which retrieve relevant documents


quickly
15

Reciprocal Rank (RR)


Equals

to precision at the 1st retrieved relevant


document
Useful for tasks need only 1 relevant document
ex: Question & Answering

Mean Reciprocal Rank (MRR)


The

mean of RR over several queries

03/24/15

15

R-Precision
Generate

a single value summary of ranking by


computing the precision at the R-th position in
the ranking, where R is the total number of
relevant documents for the current query
1. d123
6. d9
2. d84
7. d511
3. d56
8. d129
4. d6
9. d187
5. d8
10. d25
R=10 and # relevant=4
R-precision=4/10=0.4

03/24/15

2.

1.
2.

d123
d84

3.

56

R=3 and # relevant=1


R-precision=1/3=0.33
16

Precision Histograms
A R-precision
Compare the

graph for several queries


retrieval history of two algorithms

RPA / B (i ) RPA (i ) RPB (i )


where RPA (i ) and RPB (i ) are R precision values of
retrieval a lg orithms A and B for the i th query
RPA/B=0: both algorithms have equivalent

performance for the i-the query


RPA/B>0: A has better retrieval performance for
query i
RPA/B<0: B has better retrieval performance for
query i
03/24/15

17

1.5

1.0
0.5
0.0

-0.5
-1.0
-1.5

2
Query Number

03/24/15

18

10

Statistical summary regarding the set of


all the queries in a retrieval task
the

number of queries used in the task


the total number of documents retrieved by all
queries
the total number of relevant documents which
were effectively retrieved when all queries are
considered
the total number of relevant documents which
could have been retrieved by all queries

03/24/15

19

Estimation of maximal recall requires


knowledge of all the documents in the
collection
Recall and precision capture different
aspects of the set of retrieved documents
Recall and precision measure the
effectiveness over queries in batch mode
Recall and precision are defined under the
enforcement of linear ordering of the
retrieved documents

03/24/15

20

harmonic mean F(j) of recall and precision


F ( j)

1
1

R( j ) P( j )
R(j): the recall for the j-th document in the
ranking
P(j): the precision for the j-th document in
the ranking

2 P R
F
PR

03/24/15

21

1. d123
2. d84
3. d56
4. d6
5. d8
(33.3%,33.3%)
F (3)

03/24/15

2
1
1

0.33 0.33

6. d9
7. d511
8. d129
9. d187
10. d25
(25%,66.6%)

0.33 F (8)

11. d38
12. d48
13. d250
14. d113
15. d3
(20%,100%)

2
1
1

0.25 0.67

0.36 F (15)

22

2
1
1

0.20 1

0.33

E evaluation measure
Allow

the user to specify whether he is more


interested in recall or precision

E( j) 1

1 b2
b2
1

R( j ) P( j )

( 1) P R
F
2
PR
2

03/24/15

23

Basic assumption of previous evaluation


The

set of relevant documents for a query is the


same, independent of the user

User-oriented measures
coverage

ratio
novelty ratio
relative recall
recall effort

03/24/15

24

| Rk |
cov erage
|U |

high coverage ratio: system finds most of the relevant


documents the user expected to see

| Ru |
high novelty ratio: the system reveals many new
novelty
relevant documents which were
| Ru | | Rk |
previously unknown

Relevant Docs |R|

relative recall=
| Rk | | Ru |
|U |

Answer Set |A| (proposed by system)

recall effort:
# of relevant docs
the user expected
to find/# of docs
examined to find
the expected
relevant docs

Relevant Docs
known to the user |U|
Relevant Docs
known to the User
which were retrieved |Rk|
03/24/15

Relevant Docs
previously unknown to the
user which were retrieved |Ru|
26

Вам также может понравиться