Вы находитесь на странице: 1из 4


Wonyong Eom, Sihyoung Lee, Wesley De Neve, and Yong Man Ro

Image and Video Systems Lab, Korea Advanced Institute of Science and Technology (KAIST),
Yuseong-gu, Daejeon, Republic of Korea
{ewony, ijiat, wesley.deneve, ymro}@kaist.ac.kr


Tag recommendation allows mitigating the amount of user effort
needed to annotate images. Assuming that favorite images and
their associated tags are indicative of the visual and topical
interests of users, this paper proposes a personalized image tag
recommendation technique that makes use of favorite image
context. Specifically, to recommend tags for a newly uploaded
image, we propose to take advantage of the tags assigned to
favorite images of the user who uploaded the image, fusing tag
statistics and visual similarity. Experimental results obtained for
images and tags retrieved from Flickr indicate that the use of
favorite image context for the purpose of tag recommendation is
promising, compared to the use of personal and collective context.

I ndex Terms Annotation, favorite image context, tag
recommendation, tagging, Flickr


Thanks to easy-to-use multimedia devices, cheap storage and
bandwidth, and an increasing number of people going online, the
number of photos shared on online social network services keeps
growing at a fast rate. To facilitate effective image search and
management, online social network services typically make use of
tags. However, manual tagging of images is labor intensive. As a
result, images shared on online social network services are often
not or only weakly annotated [1][2].
Tag recommendation allows mitigating the amount of user
effort needed to annotate images. In [3], an image tag
recommendation method is proposed that makes use of all images
and tags available on an online social network service (collective
context). That way, a wide range of tags can be suggested.
However, tag recommendation using collective context is not
personalized [4]. Moreover, the use of collective context is
computationally expensive.
In [5], an image tag recommendation method is proposed that
makes use of images and tags previously made available by a user
(personal context). Compared to the use of collective context, the
use of personal context makes it possible to better reflect the
interests of users. In addition, the computational complexity is
lower (only a limited number of already annotated images are
used). However, the effectiveness of using personal context for
image tag recommendation is highly dependent on past tagging
behavior of a user [4]. Moreover, the tags suggested using personal
context are topically limited.
Users of online services for image sharing interact with each
other. Motivated by this observation, the authors of [4]investigate
image tag recommendation using contact and group context.
Whereas contact context is derived from images and tags made
available by the contacts of a user, group context is derived from
images and tags that belong to a pool of groups joined by the user.
The experimental results reported in [4] demonstrate that group
context can be successfully used to boost the effectiveness of
image tag recommendation.
As concluded in [4], additional types of context need to be
investigated. Moreover, the tag recommendation techniques
presented in [4] only make use of tag statistics, and do not take
advantage of visual similarity. This paper addresses both of the
aforementioned challenges, focusing on using a new type of
context for the purpose of image tag recommendation. In particular,
assuming that favorite images and their associated tags are
indicative of the visual and topical interests of a user [6], this paper
proposes to make use of favorite image context in order to realize
personalized image tag recommendation, fusing tag statistics and
visual similarity. Experimental results obtained for images and tags
retrieved from Flickr indicate that the use of favorite image context
for the purpose of tag recommendation is promising, compared to
the use of personal and collective context.
The remainder of this paper is organized as follows. In
Section 2, we introduce our novel method for image tag
recommendation, relying on favorite image context. Experimental
results are discussed in Section 3. Finally, conclusions and
directions for future research are presented in Section 4.


2.1. Favorite image context

2.1.1. Definition
In this paper, we assume that a set of favorite images and their
associated tags reflect the visual and topical interests of a user
(given that favorite images have been explicitly bookmarked by the
user). Fig. 1 shows the relationship between personal, collective,
and the proposed favorite image context, visualized from the point-
of-view of a user who uploaded an image that needs to be
annotated. A solid line between a user and an image indicates that
the user owns the image, whereas a dotted line between a user and
an image signals that the user bookmarked the image as a favorite
image. Finally, the three bounding boxes denote the three types of
contexts used in this paper.
personal context
favorite image context
collective context

Fig. 1. Personal, collective, and favorite image context,
visualized from the point-of-view of a user who uploaded a new
image that needs to be annotated.

2.1.2. Favorite images in MIRFLICKR-25000
Similar to the contact and group context in [4], the activity of a
user may have a significant influence on the size of the favorite
image context and on the effectiveness of image tag
recommendation. We therefore investigated the distribution of the
number of favorite images on Flickr for the 9,861 users of the
publicly available MIRFLICKR-25000 image set [7], assuming
that their behavior in terms of image bookmarking is representative
for the whole Flickr population.
Fig. 2 shows the distribution of the number of favorite images
on Flickr for the users of MIRFLICKR-25000. The x axis
represents the 9,861 users, whereas the y axis represents the
number of favorite images using a log scale. We can observe that
455 users did not bookmark any images, while 187 users
bookmarked more than 10,000 favorite images. In addition, about
40% of the users bookmarked at least 500 favorite images (the
median number of images bookmarked by the users of
MIRFLICKR-25000 is 301).

2.2. Tag recommendation using favorite image context

The relevance of a set of tags T
to a query image q is defined as
, q). If R(T
, q) is maximal, then we assume that the tags in T

are correct for q. Maximizing the relevance of T
to q can be
expressed as follows:
). , ( max arg
q T R T
The computation of R(T
, q) requires evaluating a significant
number of tag combinations. Therefore, to reduce the
computational complexity, we assume that the tags in T are
statistically independent. Tag recommendation can then be
described using the following equation:
}, ) , ( | { , > e ~ q t R and T t t T

where t is a tag associated with q, R(t, q) is the relevance of t to q,
and is a threshold that determines whether t is correct or noisy
with respect to q.
The relevance function R(t, q) is modeled by combining the
output of two elementary relevance functions, making it possible to
suggest tags that are highly ranked by both of the two elementary
relevance functions. To combine the output of the two elementary
relevance functions, we make use of a weighted summation:
1 9861




Fig. 2. Number of favorite images per MIRFLICKR-25000 user.

), , ( ) 1 ( ) , ( ) , ( q t R q t R q t R
img tag
+ = o o
where R
(t, q) is an elementary relevance function that makes use
of tag statistics and where R
(t, q) is an elementary relevance
function that relies on image similarity. The parameter represents
a normalized weight value, making it possible to trade off the
influence of tag statistics against the influence of visual similarity.
First, R
(t, q) is modeled by making use of the method
proposed in [3]. As this method relies on tag occurrence and tag
co-occurrence statistics, initial tags are needed. Similar to [4], we
randomly select two initial tags from the tags already assigned to
the test image by the user and measure the effectiveness in terms of
how many ground truth tags can be recommended (see Section 3.1).
Given a certain type of context and a set V of initial tags v for q,
the relevance value of each candidate tag t is then modeled as
otherwise ,
0 ) | ( if ), | (
) ( ) , (


V v
v t P v t P
t P q t R

where is non-zero value significantly smaller than the lowest
P(t|v) in order to avoid the zero-probability problem (please see [4]
for further details).
The probability of a tag t occurring and the conditional
probability of two tags t and v co-occurring in a certain type of
context are defined as follows:
, ) (
t P

where |I| represents the total number of images in a certain type of
context, where |I
| and |I
| respectively denote the number of images
annotated with t and v in the context used, and where |I
indicates the number of images annotated with both t and v in the
context under consideration.
Second, R
(t, q) is modeled using the method outlined in [8]:
) | (
) | ( ) , | (
) , | ( ) , (
q p
t P t q P
q t P q t R
where Q consists of images visually similar to q. The denominator
is constant and can be ignored (as the denominator is not a
function of t).
, ) | (
v t
v t P

The first conditional probability in (7) is modeled using a
Gaussian distribution (please see [8] for further details). The
second conditional probability in (7) can be expressed as follows:
, ) , | (
t q
q t P

= Q
where |I
| represents the number of images in Q and |I
| denotes
the number of images in Q annotated with tag t.


3.1. Experimental setup

We assume that users actively bookmark favorite images on Flickr.
Following this assumption, we collected images from users
meeting the following requirements: 1) the users uploaded at least
100 images; 2) the users assigned at least 500 tags; and 3) the users
bookmarked at least 500 favorite images. As a result, using the
Flickr API, we retrieved a total of 387,397 images (on September
30, 2010) from 27 users. The images retrieved are either favorite
images or owned by the 27 users selected. Moreover, the images
are annotated with a total of 4,657,288 tags by 46,686 users.
To study the influence of the number of images in the favorite
image context on the effectiveness of tag recommendation, we
divided the 27 users into four groups according to their
bookmarking activity. These four groups are listed in Table 1. The
average number of favorite images per user in MIRFLICKR-25000
was used to create a group that represents users with average
bookmarking activity (Level 1 in Table 1).
Table 1. Minimum and maximum number of favorite images for
each group of users.
Activity Level 1 Level 2 Level 3 Level 4
Number of

For testing purposes, we randomly selected 378 images from
the images uploaded by the 27 users (14 test images per user), and
where all of the selected test images were annotated with at least
ten tags. Further, we removed images that belong to the same event.
This finally resulted in the use of 342 test images. Experts created
a ground truth by making use of the tags suggested by all of the tag
recommendation methods studied, selecting tags that describe
visual aspects of the test images used.
To calculate the distance between images, we used global and
local image features. The 128-D MPEG-7 Scalable Color
Descriptor (SCD) [9] was used as a global image feature, whereas
local image features were created using the Bag of Visual Words
(BoVW) approach [10]. BoVW used a vocabulary of 500 visual
words, derived from 61,901 training images. Interest points were
detected, described, and clustered using Difference of Gaussians
(DoG), the Scale Invariant Feature Transform (SIFT), and K-
means clustering, respectively. To measure the distance between
two images in feature space, the L
metric was used for both the
global and local image features.
We measured the effectiveness of tag recommendation using
the following metrics: the precision of the top five recommended
tags (P@5), success among the top five recommended tags (S@5),
and the precision of the top one recommended tag (P@1).
3.2. Experimental results

3.2.1. Effectiveness of using favorite image context
Our first experiment made use of R
(t, q) to recommend tags.
Compared to the use of personal and collective context, Table 2 for
instance shows that the use of favorite image context allows for a
relative improvement of 36% and 16% in terms of P@5.
Table 2. Tag recommendation using R
(t, q).
Context P@5 S@5 P@1
Personal 0.158 0.609 0.318
Collective 0.208 0.612 0.373
Favorite 0.247 0.729 0.457

Our second experiment made use of R
(t, q) to recommend
tags (using SCD). Table 3 shows that the use of favorite image
context is more effective than the use of personal and collective
context, irrespective of the metric used. As an example, compared
to personal and collective context, favorite image context
respectively allows for a relative improvement of 36% and 29% in
terms of P@5.
Table 3. Tag recommendation using SCD-based R
(t, q).
Context P@5 S@5 P@1
Personal 0.187 0.629 0.384
Collective 0.208 0.611 0.324
Favorite 0.294 0.813 0.446

Our third experiment made use of R
(t, q) to recommend tags
(using BoVW). Table 4 shows that the use of favorite image
context is most effective in terms of P@5 and S@5, while the use
of collective context is most effective in terms of P@1.
Table 4. Tag recommendation using BoVW-based R
(t, q).
Context P@5 S@5 P@1
Personal 0.206 0.697 0.367
Collective 0.309 0.767 0.523
Favorite 0.317 0.813 0.513

3.2.2. Influence of fusion and bookmarking activity
Fig. 3 shows the influence of bookmarking activity on the
effectiveness of tag recommendation when making use of favorite
image context. The following weight values were heuristically
selected for , optimizing the effectiveness of tag recommendation:
=0.45 when using SCD and =0.2 when using BoVW. We can
observe that the combined use of tag statistics and visual similarity
is more effective than the separate use of either tag statistics or
visual similarity. Indeed, the combined use of tag statistics and
visual similarity allows suggesting tags that are ranked high by
both of the two elementary relevance functions. Also, the
combined use of tag statistics and BoVW is more effective than the
combined use of tag statistics and SCD.
Fig. 4 shows three example images. For each of these images,
the top 10 recommended tags are provided and sorted according to
their rank. The rank of a recommended tag is determined using
R(t,q). Correct tags (i.e., tags that belong to the ground truth) have
been underlined in Fig. 4. We can observe that fusing tag statistics
and visual similarity allows suggesting more relevant tags than the
separate use of either tag statistics or visual similarity.

tag statistics
october, street, film, michigan,
bw, california, color,
architecture, city, japan
nature, africa, photo, image,
moth, bird, wildlife, birds,
macro, Australia
sardegna, mare, donna, fitness,
street, red, luce, bw, green, light
visual similarity
blue, street, film, clouds, night,
black, bw, history, rome, color
nature, wildlife, macro, birds,
africa, flower, animal, flowers,
safari, butterfly
italy, bw, red, milano, green, street,
silhouette, people, paris, canon
tag statistics +
visual similarity
street, blue, film, black, color, city,
night, history, architecture, city
nature, wildlife, macro, birds,
africa, animal, bird, flower,
safari, flowers
italy, bw, red, green, street, milano,
silhouette, light,
sardegna, shadow
Fig. 4. Example images with recommended tags using favorite image context

Level 1 Level 2 Level 3 Level 4
Type of user group
tag statistics
visual similarity (SCD)
visual similarity (BoVW)
tag statistics + visual similarity (SCD)
tag statistics + visual similarity (BoVW)

Fig. 3. P@5 for different levels of bookmarking activity.


This paper proposed a novel method for image tag
recommendation, making use of favorite image context. Our
experimental results show that, in general, the use of favorite
image context is more effective than the use of personal and
collective context. In addition, fusing tag statistics and visual
similarity allows for a higher effectiveness in terms of P@5,
compared to their separate usage.
Future research will analyze the computational complexity of
the proposed tag recommendation method in more detail. Attention
will also be paid to more advance weighting schemes.

This research was supported by the Basic Science Research
Program of the National Research Foundation (NRF) of Korea,
funded by the Ministry of Education, Science and Technology
(research grant: 2010-0012495).


[1] X. Li, C.G.M. Snoek, M. Worring, Learning Social Tag
Relevance by Neighbor Voting, IEEE Transactions on
Multimedia, pp. 1310-1322, 2009.
[2] D. Liu, M. Wang, L. Yang, X. Hua, and H. J. Zhang, Tag
Quality Improvement for Social Images, Proc. of ICME, pp.
250-353, 2009.
[3] B. Sigurbjrnsson and R. van Zwol, Flickr Tag
Recommendation based on Collective Knowledge,
International Conference on World Wide Web, pp. 327-336,
[4] A. Rae, B Sigurbjrnsson, R. van Zwol, Improving Tag
Recommendation using Social Networks, International
Conference on Adaptivity, Personalization and Fusion of
Heterogeneous Information, 2010.
[5] N. Grag and I. Weber, Personalized, Interactive Tag
Recommendation for Flickr, ACM Conference on
Recommender Systems, pp. 67-74, 2008.
[6] R. van Zwol, A. Rae, and L. G. Pueyo, Prediction of
Favourite Photos using Social, Visual, and Textual Signals,
ACM Multimedia, pp. 1015-1018, 2010.
[7] M. J. Huiskes, M. S. Lew, The Mir Flickr Retrieval
Evaluation, ACM International Conference on Multimedia
Information Retrieval, pp. 39-43, 2008.
[8] S. Lee, W. De Neve, K. N. Plataniotis, Y. M. Ro, MAP-
based Image Tag Recommendation using a Visual
Folksonomy, Pattern Recognition Letters, pp. 975-982,
[9] B. S. Manjunath, P. Salembier, T. Sikora, Introduction to
MPEG-7 : Multimedia Content Description Interface. John
Wiley and Sons, 2002.
[10] P. Tirilly, V. Claveau, and P. Gros, Language Modeling for
Bag-of-Visual Words Image Categorization, International
Conference on Content-based Image and Video Retrieval, pp.
249-258, 2008.

Вам также может понравиться