Академический Документы
Профессиональный Документы
Культура Документы
>
V v
tag
v t P v t P
t P q t R
c
(4)
where is non-zero value significantly smaller than the lowest
P(t|v) in order to avoid the zero-probability problem (please see [4]
for further details).
The probability of a tag t occurring and the conditional
probability of two tags t and v co-occurring in a certain type of
context are defined as follows:
, ) (
I
I
t P
t
=
(5)
where |I| represents the total number of images in a certain type of
context, where |I
t
| and |I
v
| respectively denote the number of images
annotated with t and v in the context used, and where |I
t
I
v
|
indicates the number of images annotated with both t and v in the
context under consideration.
Second, R
img
(t, q) is modeled using the method outlined in [8]:
,
) | (
) | ( ) , | (
) , | ( ) , (
Q
Q Q
Q
q p
t P t q P
q t P q t R
img
=
(7)
where Q consists of images visually similar to q. The denominator
is constant and can be ignored (as the denominator is not a
function of t).
, ) | (
v
v t
I
I I
v t P
=
(6)
The first conditional probability in (7) is modeled using a
Gaussian distribution (please see [8] for further details). The
second conditional probability in (7) can be expressed as follows:
, ) , | (
q
t q
I
I I
q t P
= Q
(8)
where |I
q
| represents the number of images in Q and |I
q
I
t
| denotes
the number of images in Q annotated with tag t.
3. EXPERIMENTS
3.1. Experimental setup
We assume that users actively bookmark favorite images on Flickr.
Following this assumption, we collected images from users
meeting the following requirements: 1) the users uploaded at least
100 images; 2) the users assigned at least 500 tags; and 3) the users
bookmarked at least 500 favorite images. As a result, using the
Flickr API, we retrieved a total of 387,397 images (on September
30, 2010) from 27 users. The images retrieved are either favorite
images or owned by the 27 users selected. Moreover, the images
are annotated with a total of 4,657,288 tags by 46,686 users.
To study the influence of the number of images in the favorite
image context on the effectiveness of tag recommendation, we
divided the 27 users into four groups according to their
bookmarking activity. These four groups are listed in Table 1. The
average number of favorite images per user in MIRFLICKR-25000
was used to create a group that represents users with average
bookmarking activity (Level 1 in Table 1).
Table 1. Minimum and maximum number of favorite images for
each group of users.
Activity Level 1 Level 2 Level 3 Level 4
Number of
favorite
images
849
1,277
2,345
2,917
4,830
10,001
11,441
56,115
For testing purposes, we randomly selected 378 images from
the images uploaded by the 27 users (14 test images per user), and
where all of the selected test images were annotated with at least
ten tags. Further, we removed images that belong to the same event.
This finally resulted in the use of 342 test images. Experts created
a ground truth by making use of the tags suggested by all of the tag
recommendation methods studied, selecting tags that describe
visual aspects of the test images used.
To calculate the distance between images, we used global and
local image features. The 128-D MPEG-7 Scalable Color
Descriptor (SCD) [9] was used as a global image feature, whereas
local image features were created using the Bag of Visual Words
(BoVW) approach [10]. BoVW used a vocabulary of 500 visual
words, derived from 61,901 training images. Interest points were
detected, described, and clustered using Difference of Gaussians
(DoG), the Scale Invariant Feature Transform (SIFT), and K-
means clustering, respectively. To measure the distance between
two images in feature space, the L
2
metric was used for both the
global and local image features.
We measured the effectiveness of tag recommendation using
the following metrics: the precision of the top five recommended
tags (P@5), success among the top five recommended tags (S@5),
and the precision of the top one recommended tag (P@1).
3.2. Experimental results
3.2.1. Effectiveness of using favorite image context
Our first experiment made use of R
tag
(t, q) to recommend tags.
Compared to the use of personal and collective context, Table 2 for
instance shows that the use of favorite image context allows for a
relative improvement of 36% and 16% in terms of P@5.
Table 2. Tag recommendation using R
tag
(t, q).
Context P@5 S@5 P@1
Personal 0.158 0.609 0.318
Collective 0.208 0.612 0.373
Favorite 0.247 0.729 0.457
Our second experiment made use of R
img
(t, q) to recommend
tags (using SCD). Table 3 shows that the use of favorite image
context is more effective than the use of personal and collective
context, irrespective of the metric used. As an example, compared
to personal and collective context, favorite image context
respectively allows for a relative improvement of 36% and 29% in
terms of P@5.
Table 3. Tag recommendation using SCD-based R
img
(t, q).
Context P@5 S@5 P@1
Personal 0.187 0.629 0.384
Collective 0.208 0.611 0.324
Favorite 0.294 0.813 0.446
Our third experiment made use of R
img
(t, q) to recommend tags
(using BoVW). Table 4 shows that the use of favorite image
context is most effective in terms of P@5 and S@5, while the use
of collective context is most effective in terms of P@1.
Table 4. Tag recommendation using BoVW-based R
img
(t, q).
Context P@5 S@5 P@1
Personal 0.206 0.697 0.367
Collective 0.309 0.767 0.523
Favorite 0.317 0.813 0.513
3.2.2. Influence of fusion and bookmarking activity
Fig. 3 shows the influence of bookmarking activity on the
effectiveness of tag recommendation when making use of favorite
image context. The following weight values were heuristically
selected for , optimizing the effectiveness of tag recommendation:
=0.45 when using SCD and =0.2 when using BoVW. We can
observe that the combined use of tag statistics and visual similarity
is more effective than the separate use of either tag statistics or
visual similarity. Indeed, the combined use of tag statistics and
visual similarity allows suggesting tags that are ranked high by
both of the two elementary relevance functions. Also, the
combined use of tag statistics and BoVW is more effective than the
combined use of tag statistics and SCD.
Fig. 4 shows three example images. For each of these images,
the top 10 recommended tags are provided and sorted according to
their rank. The rank of a recommended tag is determined using
R(t,q). Correct tags (i.e., tags that belong to the ground truth) have
been underlined in Fig. 4. We can observe that fusing tag statistics
and visual similarity allows suggesting more relevant tags than the
separate use of either tag statistics or visual similarity.
images
tag statistics
october, street, film, michigan,
bw, california, color,
architecture, city, japan
nature, africa, photo, image,
moth, bird, wildlife, birds,
macro, Australia
sardegna, mare, donna, fitness,
street, red, luce, bw, green, light
visual similarity
blue, street, film, clouds, night,
black, bw, history, rome, color
nature, wildlife, macro, birds,
africa, flower, animal, flowers,
safari, butterfly
italy, bw, red, milano, green, street,
silhouette, people, paris, canon
tag statistics +
visual similarity
street, blue, film, black, color, city,
night, history, architecture, city
nature, wildlife, macro, birds,
africa, animal, bird, flower,
safari, flowers
italy, bw, red, green, street, milano,
silhouette, light,
sardegna, shadow
Fig. 4. Example images with recommended tags using favorite image context
0
0.1
0.2
0.3
0.4
Level 1 Level 2 Level 3 Level 4
P
@
5
Type of user group
tag statistics
visual similarity (SCD)
visual similarity (BoVW)
tag statistics + visual similarity (SCD)
tag statistics + visual similarity (BoVW)
Fig. 3. P@5 for different levels of bookmarking activity.
4. CONCLUSIONS AND FUTURE WORK
This paper proposed a novel method for image tag
recommendation, making use of favorite image context. Our
experimental results show that, in general, the use of favorite
image context is more effective than the use of personal and
collective context. In addition, fusing tag statistics and visual
similarity allows for a higher effectiveness in terms of P@5,
compared to their separate usage.
Future research will analyze the computational complexity of
the proposed tag recommendation method in more detail. Attention
will also be paid to more advance weighting schemes.
5. ACKNOWLEDGEMENTS
This research was supported by the Basic Science Research
Program of the National Research Foundation (NRF) of Korea,
funded by the Ministry of Education, Science and Technology
(research grant: 2010-0012495).
6. REFERENCES
[1] X. Li, C.G.M. Snoek, M. Worring, Learning Social Tag
Relevance by Neighbor Voting, IEEE Transactions on
Multimedia, pp. 1310-1322, 2009.
[2] D. Liu, M. Wang, L. Yang, X. Hua, and H. J. Zhang, Tag
Quality Improvement for Social Images, Proc. of ICME, pp.
250-353, 2009.
[3] B. Sigurbjrnsson and R. van Zwol, Flickr Tag
Recommendation based on Collective Knowledge,
International Conference on World Wide Web, pp. 327-336,
2008.
[4] A. Rae, B Sigurbjrnsson, R. van Zwol, Improving Tag
Recommendation using Social Networks, International
Conference on Adaptivity, Personalization and Fusion of
Heterogeneous Information, 2010.
[5] N. Grag and I. Weber, Personalized, Interactive Tag
Recommendation for Flickr, ACM Conference on
Recommender Systems, pp. 67-74, 2008.
[6] R. van Zwol, A. Rae, and L. G. Pueyo, Prediction of
Favourite Photos using Social, Visual, and Textual Signals,
ACM Multimedia, pp. 1015-1018, 2010.
[7] M. J. Huiskes, M. S. Lew, The Mir Flickr Retrieval
Evaluation, ACM International Conference on Multimedia
Information Retrieval, pp. 39-43, 2008.
[8] S. Lee, W. De Neve, K. N. Plataniotis, Y. M. Ro, MAP-
based Image Tag Recommendation using a Visual
Folksonomy, Pattern Recognition Letters, pp. 975-982,
2010.
[9] B. S. Manjunath, P. Salembier, T. Sikora, Introduction to
MPEG-7 : Multimedia Content Description Interface. John
Wiley and Sons, 2002.
[10] P. Tirilly, V. Claveau, and P. Gros, Language Modeling for
Bag-of-Visual Words Image Categorization, International
Conference on Content-based Image and Video Retrieval, pp.
249-258, 2008.