Вы находитесь на странице: 1из 4

CONTENT-BASED IMAGE RETRIEVAL USING FUZZY

VISUAL REPRESENTATION
Anastasios D. Doulamis, Nikolaos D. Doulamis and Stefanos D. Kollias
National Technical University of Athens, Dept. of Electrical and Computer Engineering,
e-mail: adoulam@cs.ntua.gr

ABSTRACT from any kind of images or video frames, regardless of lu-


minosity conditions, scaling and orientation of the object or
In this paper, a fuzzy approach is used for eciently repre- even noise e ects [7]. For this reason, several, usually low
senting the visual content. Initially, the image is partitioned level, features are extracted from an image and then e-
into several segments (objects) and for each segment appro- ciently combined to provide a more compact and semantic
priate features, such as the segment color and texture, are representation of the visual content.
extracted. In the following, all features are classi ed in a In the context of this paper, a fuzzy representation of vi-
fuzzy framework, resulting in a content interpretation closer sual content is proposed, which improves the performance of
to the human perception. Furthermore, such fuzzy repre- video content-based retrieval algorithms, since it provides an
sentation of visual content gives to the users the capabil- interpretation closer to the human perception. Instead, the
ity of expressing video queries using statements drawn from current approaches [4]-[6] are based on a \binary" classi ca-
the natural language. An objective criterion has been also tion and therefore it is possible to assign two similar features
proposed to evaluate the system reliability, which indicates (located near the class boundaries) to di erent classes caus-
the advantages of the proposed fuzzy approach compared to ing erroneous representation. Furthermore, they are sensi-
other traditional techniques for image retrieval. tive to possible noise and erroneous estimation of feature
values.
1 INTRODUCTION
In the recent years, there is a tremendous increase of multi-
media data, due to the rapid progress in capturing, acquiring
and storing audio-visual information. The traditional ap-
proach of keyword annotation to accessing image or video
information has the drawback that, apart from the large
amount of e ort for developing annotations, it cannot ef-
ciently characterize the rich visual content using only text.
Furthermore, the performance of such a system heavily de-
pends on the keywords. For this reason, content-based re-
trieval algorithms have attracted a great research interest
[1]. In this framework, the MPEG group is currently de n-
ing a new standardization phase (MPEG-7) for multimedia
content description interface [2].
As a result, a lot of content-based retrieval systems have Figure 1: The proposed architecture.
been developed and many prototypes have been built. The
active research e ort has been re ected in many special is-
sues of leading journals dedicated to this topic [1]. Examples 2 VISUAL CONTENT ANALYSIS
of products, which are now in the rst stage of commercial
exploitation, include the Virage, VisualSEEK and QBIC pro- Figure 1 shows a block diagram of the proposed video/image
totypes [3]. Color image retrieval has been examined in [4] content-based retrieval system. As is observed, the architec-
based on a hidden Markov model. Object modeling and seg- ture consists of two main modules, one responsible for video
mentation for indexing in video databases has been reported sequences and one for still images.
in [5], while a 3-D wavelet decomposition has been proposed Video content based retrieval: In case of video se-
in [6]. quences, a "pre-indexing" stage should be introduced, ex-
However, the performance of a content-based retrieval sys- tracting a small set of characteristics frames (key-frames) by
tem mainly depends on an ecient representation of the vi- means of a content-based retrieval algorithm. This results in
sual content. In general, the traditional pixel-based repre- a video summarization scheme similar to that used in docu-
sentation su ers from the lack of a semantic meaning. On ment search engines. The reason of summarizing the video
the other hand, it is too hard to extract semantic objects content is the fact that the standard video representation, as
a sequence of consecutive frames, results in signi cant tem- nj (si;j ) over all si;j of si de nes the degree of membership
poral redundancy of visual content, which is not adequate of vector si to the L-dimensional class n = [n1 n2 : : : nL ]T
for video browsing and content-based retrieval. In our case the elements of which express the class to which the elements
video summarization is performed using the algorithm of [9] of si belong.
which discards frames or shots of similar visual content by
minimizing a cross correlation criterion. At this point, the
problem of content-based video retrieval has actually reduced n =
Y
L
nj (si;j ) (2)
to still image retrieval [10] and thus is handled similar to the j=1
following case. Gathering all segments of a frame, a multidimensional
Image content based retrieval: For each still image fuzzy histogram is created,
(or key frame in case of a video sequence), several features
are extracted obtained by examining global or local image
properties. In rst case, global color, motion and texture H (n) = K1
X  (s ) = 1 X Y 
K
n i
K L
nj (si;j ) (3)
i=1
K i=1 j=1
characteristics are used to describe the visual content, while
on the other hand, a segmentation algorithm is applied for H (n) thus can be viewed as a degree of membership of a
indicating the local image characteristics (see Figure. 1). whole frame to class n. A frame feature vector f is then
A multiresolution implementation of the Recursive Shortest formed by gathering all values of H (n) for all classes n, i.e.,
Spanning Tree (RSST) algorithm is used in our case to per- for all QL combinations of indices, resulting in a vector of
form the segmentation task. The use of the RSST is based QL elements
on the fact that it is considered as one of the most powerful 3.1 Implementation Issues
tools for image segmentation compared to other techniques
[11]. However, the complexity of the RSST still remains very The performance of the content-based retrieval mechanism
high especially for images of large size. Instead, the proposed depends on the selected values of fuzzy representation; the
M-RSST approach yields much faster execution time, while shape of membership functions and the number of partitions
simultaneously keeping the performance at similar level. In Q. For this reason, the following experiment is carried out
our case, the color and motion segments are used for describ- to estimate the parameters of the fuzzy representation. Par-
ing the local visual content. In particular, the size, location ticularly, fteen image queries are submitted to a database
and average color (motion) components of all color (motion) consisting of 1250 images and key-frames about the space.
segments are gathered as color (motion) properties. Then, Each time the 10 most appropriate images are returned. To
the aforementioned features are classi ed based on a fuzzy evaluate the performance of the content-based retrieval sys-
formulation scheme and the resulted fuzzy feature vector of tem, a distance or similarity measure is used to nd the set
each still image (or key-frame) is stored in the database and of images that best match the user's query. Let us denote
used for content-based retrieval. as fq the feature vector of the user's query and as fi , the
respective vector of the ith image in the database.
3 FUZZY LOGIC FOR VISUAL REPRESEN- Then, the Euclidean distance is,
TATION
Let us assume that K c color and K m motion segments q i
X
QL
d(f ; f ) = (f q;j ; fi;j ) (4)
have been extracted. Then, for each color segment Sic , j=1
i = 1; :::;K c, an Lc  1 vector sci is formed, while for each where fq;j and fi;j are the j th element of vectors fq and
motion segment Sim , i = 1; 2; ::;K m an Lm  1 vector sm i is fi respectively. The similarity measure is normalized in
formed as follows: the interval [0 1] to allow comparisons between di erent
user's queries, which in general give di erent distance val-
sci = [cT (Sic ) lT (Sic ) a(Sic )]T ues. Then, for an image query, a similarity degree, say ti , is
assigned to all images of the database, which indicates how
smi = [vT (Sim ) lT (Sim ) a(Sim )]T (1) similar is the content of the ith image to the query. The
where a denotes the size of the color or motion segment, and l absolute di erence EA of the normalized distance and simi-
is a 21 vector indicating the horizontal and vertical location larity degree over all the best M retrieved images is used to
X
of the segment center; the 3  1 vector c includes the average evaluate the system performance to the user's query,
values of the three color components of the respective color
segment, while the 2  1 vector v includes the average motion E = 1
A jSM j i2SMkd (f ; f ) ; t k
nrm q i i (5)
vector of the motion segment. For the sake of notational
simplicity, the superscripts c and m will be omitted in the where SM is the set containing the best M retrieved images
sequel; each color or motion segment will be denoted as Si for a given user's query, jSM j its cardinality and dnrm () the
and will be described by the L  1 vector si normalized distance. The di erence EA expresses how close
Based on the above, si = [si;1 si;2 : : : si;L ]T is a vector are the best M retrieved images to the user's query. An-
containing all properties extracted from the ith segment, Si . other approach is to examine the performance of the system
Each element si;j , j = 1; 2; :::;L of vector si is then parti- over all relevant images to the user's query, i.e., images of
tioned into Q regions by means of Q membership functions
nj (si;j ) nj = 1; 2; :::;Q.
As in the previous case, nj (si;j ) denotes the degree of
similarity degree ti = 0.
E = 1
B
X d (f ; f )
nrm q i (6)
membership of si;j to the nj th class. Then, the product of jSt j St=fi:ti =0g
Membership Function Type togram [2] (Figure 3(c,d)). The comparison is also performed
Q Triangular Trapezoid Quadratic Binary quantitatively using both the errors EA and EB over all the
2 0.11 0.10 0.13 0.10 15 images queries of the experiment carried out in the pre-
3 1.32 1.23 1.98 1.02 vious section. The results for binary classi cation are illus-
4 5.45 5.21 7.23 4.98 trated in Figure 2, where it can been seen that the average
5 20.67 20.02 26.65 19.86 performance error in higher in all cases compared to the fuzzy
approach. For the color histogram method we have measured
Table 1: Execution times (in msec) of the fuzzy repre- an average error EA = 0:52 and EB = 0:24. As it can be seen
sentation for di erent membership functions and parti- by comparing these values with that presented in Figure 2,
tion numbers. the color histogram performance is worse for any partition
number and membership function since only the global image
Membership Function Type characteristics are taken into consideration. The computa-
RSST M-RSST Fuzzy Color Hist. tional cost for binary classi cation is very small and the total
176  144 5.65 132.01 1.32 8.67 cost is mainly a ected by the segmentation load, resulting
342  288 44.21 382.34 1.32 28.35 in a similar cost to the fuzzy approach (Table 2). Instead,
720  576 534.22 1360.90 1.32 103.45 the color histogram method demands smaller computational
load compared to segmentation (Table 2). It is observed that
the highest cost is for segmentation while the load for fuzzy
Table 2: Execution times (in msec) of color segmenta- representation is very small and independent of the image
tion, fuzzy representation (Q=3 and Triangular func- size. Color histogram requires the lowest cost but yields no
tions) and color histogram. sucient performance for the retrieval.
References
where jSt j is the cardinality of set St . [1] Special Issue on Segmentation, \Description and Re-
Figure 2(a) illustrates the average error EA for all the 15 trieval of Video Content," IEEE Trans. Circ. and Sys-
examined image queries versus the number of partitions Q for tems for Video Techn., vol. 8, no. 5, Septemeber 1998.
triangular, trapezoid and quadratic membership functions,
along with the results obtained using binary classi cation. [2] [1] ISO/IEC JTC1/SC29/WG11, \MPEG-7: Overview
It is observed that a partition number Q = 3 and triangular (version 1.0)," Doc. N3158, Hawaii, Dec. 1999.
membership functions yield the best performance. Similar [3] J. R. Smith and S. F. Chang, \VisualSEEk: A Fully Au-
results are presented in Figure 2(b) where the average error tomated Content-Based Image Query System," Proc.
EB over all fteen queries is presented. Consequently, tri- ACM Multimedia Conf., Boston, MA, USA, Nov. 1996.
angular membership functions and partition number Q = 3 [4] H.-C. Lin, L.-L. Wang and S.-N. Yang, \Color Im-
provides the best performance. Table 1 shows the execution age Retrieval Based on Hidden Markov Models," IEEE
times for di erent partitions Q and membership functions to Trans. Image Processing, vol. 6, pp. 332-339, Feb. 1997.
indicate the complexity of the proposed fuzzy representation [5] M. Gelgon and P. Bouthemy, \A Hierarchical Motion-
scheme. As is expected the computational load increases ex- Based Segmentation and Tracking Technique for Video
ponentially with respect to partition numbers. Instead, the Storyboard-Like Representation and Content-Based In-
times remain almost the same for di erent membership func- dexing," Proc. of WIAMIS, Belgium, June 1997.
tions due to the fact that in the implementation additional
costs are involved which are comparisons, procedure callings [6] J. Nam and A. Tew k, \Progressive Resolution Mo-
and so on. In all cases, however, the execution times remains tion Indexing of Video Object," Proc. of IEEE ICASSP,
very small, just few msec, even for large partition numbers Seattle WA, USA, May 1998.
(Q = 5). A comparison of the execution times for fuzzy [7] N. Doulamis, A. Doulamis, D. Kalogeras and S. Kollias,
representation (in case of Q = 3 and triangular membership \Very Low Bit-Rate Coding of Image Sequences Using
functions) with the required times for color segmentation is Adaptive Regions of Interest," IEEE Trans. CSVT, vol.
presented in Table 2 at di erent image sizes. It seems that 8, pp. 928-934, Dec. 1998.
the required time for segmentation is much greater compared [8] Y. Avrithis, A. Doulamis, N. Doulamis and S. Kollias,
to the time of fuzzy representation especially in cases of im- \A Stochastic Framework for Optimal Key Frame Ex-
ages of large size. Moreover, large Q increases exponentially traction from MPEG Video Databases," Comp. Vision
the storage requirements. and Image. Understanding, vol. 75, pp. 3-24, July 1999.
4 EXPERIMENTAL RESULTS [9] A. D. Doulamis, Y. S. Avrithis, N. D. Doulamis and
S. D. Kollias, \Interactive Content-Based Retrieval in
The proposed fuzzy representation of visual content has been Video Databases Using Fuzzy Classi cation and Rel-
evaluated, using a large database consisting of MPEG coded evance Feedback," Proc. of IEEE ICMCS, Florence,
video sequences and several images compressed in JPEG for- Italy, June 1999.
mat. In Figure 3(a) an image of a space shuttle is submitted [10] A. Alatan, L. Onural, M. Wollborn, R. Mech, E.Tuncel
as user's query. The retrieval results are displayed in Fig- and T. Sikora, \Image Sequence Analysis for Emerg-
ure 3(b), using the fuzzy parameters selected in the previous ing Interactive Multimedia Services-The European Cost
section. In the same gure a comparison of the proposed 211 Framework," IEEE Trans. CSVT, Vol. 8, pp. 802-
method with two other methods is also presented; with a 813, Nov. 1998.
binary classi cation and the traditional method of color his-
0.18
Triangular Triangular
Quadratic
0.16 Quadratic
Trapezoid Trapezoid
0.4 Binary Binary

Performance Error EB
Performance Error EA

0.14
0.3
0.12

0.2
0.1

0.1 0.08

0.06
2 2.5 3 3.5 4 4.5 5 2 2.5 3 3.5 4 4.5 5
Partition Number Q Partition Number Q

Figure 2: Average performance error over 15 queries for di erent partition numbers and membership functions.

Figure 3: An image query. (a) The submitted image. (b) The best ve retrieved images using the proposed fuzzy
classi cation scheme, (c) using binary classi cation, (d) using color histogram.

Вам также может понравиться