Академический Документы
Профессиональный Документы
Культура Документы
DCC - UFMG
Tags
Madonna
Tags
Tags
Image
Audio
Tags
Suggest terms related to the content of a target object Improve tag quality
completeness and accuracy noise
Ex.: misspellings, unrelated words
description
comments tags
8
Our Goal:
Exploit the 3 dimensions jointly
10
Problem Statement
Develop a function to estimate the relevance of a candidate term as a recommendation for the target object
Relevant term:
1st term
relevance
ranking
...
relevance
11
Problem Statement
Input
Io: set of tags previously assigned to the object
Fo = {Fo1, Fo2, ..., Fon}: set of textual features in object o Textual features exploited here: title and description
Output
Co : list of ranked candidate terms for the object o
Co Io =
12
14
: confidence of the rule X c : Set of selected association rules : size limit for X
15
our proposal
16
Traditionally used in IR
17
TS (c, o) = #textual features in object o which contain term c Previously employed for quality assessment of textual features
[Figueiredo et al., 2009]
TS(Madonna, o) = 2
TS(political, o) = 1
18
wTF and wTS weight a term based on the average descriptive power of its containing textual feature Average descriptive power of Fi : Average Feature Spread AFS(Fi)
[Figueiredo et al., 2009]
19
Discriminative Power
Capacity of a term to distinguish an object from others
Candidate terms
Generated from co-occurrences with pre-assigned tags
Extracted from multiple textual features
LATRE+DP = LATRE + (1 - ) DP
where parameter 0 1 is a weighting factor
22
Tag Recommendation Strategies: Our New Heuristics 8 heuristics Sum+ or LATRE Sum+TS Sum+TF Sum+wTS Sum+wTF
23
12 metrics
Sum (with =1 and =3) Vote, Vote+ and Sum+: alternatives to Sum TS, TF, wTS, wTF IFF, Stability, Entropy
Co-occurrence
Descriptive Power
Discriminative Power
24
Exploited techniques
RankSVM
25
160,000 videos
Evaluation Methodology
Automatic Evaluation
Tags in an object
expected answer
Evaluation Methodology
2 evaluation scenarios
Data divided in 3 subsets according to the number of tags by object
YouTube
2-5
6-9
10-77
2-77
Representative Results
State-of-the-Art Mixed Scenario. Average P@5 and 95% Confidence Intervals
Strategy Sum+ LastFM 0.411 0.001 YahooVideo 0.484 0.003 YouTube 0.245 0.002
LATRE
CTTR
0.405 0.001
0.260 0.001
0.608 0.003
0.465 0.004
0.285 0.004
0.376 0.002
LATRE outperforms Sum+ in most of the cases Up to 25% in P@5 CTTR: good alternative in some cases
29
Representative Results
State-of-the-Art Methods VS. Our New Heuristics Mixed Scenario Average P@5 and 95% Confidence Intervals
Strategy
Sum+ LATRE CTTR
LastFM
0.411 0.001 0.405 0.001 0.260 0.001
YahooVideo
0.484 0.003 0.608 0.003 0.465 0.004
YouTube
0.245 0.002 0.285 0.004 0.376 0.002
Sum+TF
Sum+TS Sum+wTF Sum+wTS
0.417 0.001
0.418 0.002 0.417 0.001
0.643 0.003
0.674 0.003 0.666 0.002
0.462 0.001
0.475 0.002 0.490 0.002
Gains over the best state-of-the-art method 40% in P@5 32% in Recall 62% in AP
TF < TS < wTF < wTS for LATRE+DP and 0.417 0.002 0.707 0.002 0.502 0.003 Sum+DP 0.408 0.001 0.698 0.002 0.472 0.002
0.411 0.001
Most promising 0.408 0.001 0.729 0.002 0.503 0.003 heuristic: 0.411 0.001 0.733 0.003 0.489 0.003 LATRE+wTS
30
0.716 0.003
0.467 0.003
Strategy
LATRE+wTS GP-based RankSVM-based Strategy LATRE+wTS
Mixed
0.411 0.001 0.433 0.003 0.411 0.002 Mixed 0.733 0.003
Smallest
0.486 0.005 0.500 0.005 0.499 0.007 YahooVideo Smallest 0.818 0.005
Medium
0.465 0.005 0.476 0.002 0.450 0.005 Medium 0.729 0.003
Largest
0.388 0.003 0.434 0.003 0.393 0.003 Largest 0.780 0.007
Smallest
0.593 0.003 0.595 0.003 0.601 0.005
Largest gains due to the use of descriptive power metric (e.g., wTS)
Future Work
Manual assessment of relevance, usefulness, diversity
Thank You
33
DCC - UFMG
Prototype
GreenMeter: Demo in SIGIR 2011
35
Outline
1. Motivation 2. Contextualization 3. State-of-the-Art 4. Goals and Contributions 5. Relevance Metrics for Tag Recommendation
LastFM Strategy Sum+ LATRE CTTR Sum+TF Sum+TS Mixed 0.411 0.001 0.405 0.001 0.260 0.001 0.417 0.001 0.418 0.002 Smallest 0.454 0.003 0.465 0.004 0.285 0.003 0.470 0.004 Medium 0.433 0.004 0.457 0.005 0.267 0.002 0.436 0.004 Largest 0.391 0.001 0.384 0.004 0.217 0.002 0.395 0.001 0.396 0.001
YahooVideo
Strategy Sum+ Mixed 0.484 0.003 Smallest 0.453 0.007 Medium Largest 0.511 0.004 0.615 0.004
LATRE
CTTR Sum+TF Sum+TS Sum+wTF Sum+wTS LATRE+TF LATRE+TS LATRE+wTF
0.608 0.003
0.465 0.004 0.643 0.003 0.674 0.003 0.666 0.002 0.707 0.002 0.698 0.002 0.716 0.003 0.729 0.002
0.525 0.006
0.649 0.005 0.755 0.002 0.764 0.003 0.784 0.003 0.795 0.004 0.781 0.002 0.785 0.001 0.821 0.004
LATRE+wTS
0.733 0.003
0.818 0.005
Sum+
LATRE CTTR Sum+TF
0.245 0.002
0.285 0.004 0.376 0.002 0.462 0.001
0.211 0.006
0.219 0.005 0.463 0.005 0.552 0.004
0.212 0.003
0.242 0.005 0.337 0.004 0.421 0.006
0.282 0.004
0.326 0.006 0.269 0.002 0.403 0.003
Sum+TS
Sum+wTF Sum+wTS LATRE+TF
0.475 0.002
0.490 0.002 0.502 0.003 0.472 0.002
0.560 0.004
0.583 0.003 0.593 0.003 0.557 0.004
0.433 0.005
0.451 0.005 0.461 0.005 0.436 0.004
0.419 0.003
0.416 0.004 0.431 0.004 0.425 0.006
LATRE+TS
0.467 0.003
0.561 0.004
0.596 0.004 0.593 0.003
0.445 0.004
0.464 0.004 0.471 0.005
0.441 0.007
0.439 0.005 0.450 0.007
GP-based
RankSVM-based
0.433 0.003
0.411 0.002
0.500 0.005
0.499 0.007
0.476 0.002
0.450 0.005
0.434 0.003
0.393 0.003
YahooVideo Strategy
LATRE+wTS GP-based RankSVM-based
Mixed
0.733 0.003 0.743 0.007 0.752 0.002
Smallest
0.818 0.005 0.822 0.004 0.826 0.003 YouTube
Medium
0.729 0.003 0.734 0.003 0.725 0.003
Largest
0.780 0.007 0.789 0.006 0.800 0.005
Strategy LATRE+wTS
GP-based RankSVM-based
Strategy
Sum+ LATRE CTTR
LastFM
0.411 0.001 0.405 0.001 0.260 0.001
YahooVideo
0.484 0.003 0.608 0.003 0.465 0.004
YouTube
0.245 0.002 0.285 0.004 0.376 0.002
Sum+TF
Sum+TS
0.417 0.001
0.418 0.002
0.643 0.003
0.674 0.003
0.462 0.001
0.475 0.002
Sum+wTF
Sum+wTS LATRE+TF LATRE+TS LATRE+wTF
0.417 0.001
0.417 0.002 0.408 0.001 0.411 0.001 0.408 0.001
0.666 0.002
0.707 0.002 0.698 0.002 0.716 0.003 0.729 0.002
0.490 0.002
0.502 0.003 0.472 0.002 0.467 0.003 0.503 0.003
LATRE+wTS
0.411 0.001
0.733 0.003
0.489 0.003
LATRE
CTTR
0.29
0.38
0.22
0.46
0.24
0.34
0.33
0.27
42
Sum+wTS
LATRE+wTS Strategy
0.42
0.41
0.47
0.49
0.44
0.46
0.40
0.39 Largest
Gains over the best stateof-the-art method 40% in P@5 32% in Recall 62% in AP
TF < TS < wTF < wTS for LATRE+DP and Sum+DP Most promising heuristic: LATRE+wTS
43
Best baseline
Sum+wTS
0.65
0.80
0.62
0.66
0.73
0.72
LATRE+wTS Strategy
Best baseline
0.74
0.82
0.73
0.78 Largest
0.28
Sum+wTS
LATRE+wTS
0.50
0.49
0.59
0.59
0.46
0.47
0.43
0.45
0.43
0.41 Mixed
0.50
0.50
0.48
0.45 Medium
0.43
0.39 Largest
YahooVideo
Smallest
0.71
0.74
0.80
0.82
0.66
0.73
0.78
0.79
0.75 Mixed
0.49
0.83
YouTube
0.73 Medium
0.47
0.80 Largest
0.45
Smallest
0.59
0.51
0.51
0.60
0.60
0.48
0.48
0.46
0.47
44