A Survey of An Adaptive Weighted Spatio-Temporal Pyramid Matching For Video Retrieval

International Journal of Computer Trends and Technology (IJCTT) volume 6 number 3 Dec 2013
ISSN: 2231-2803 http://www.ijcttjournal.org Page163

A Survey of an Adaptive Weighted Spatio-Temporal
Pyramid Matching for Video Retrieval

Priya Iype M.Tech
1

1
Department of Computer Science and Engineering, Karunya University, Coimbatore, Tamil Nadu, India

Abstract Recently, in the field of video analysis and retrieval
Human action recognition in video is an important research and
challenging topic. An efficient video retrieval is needed to search
most similar and relevant video contents of a large set of video
clips. Many methods have been used for the efficient retrieval of
videos. In this Adaptive weighted pyramid matching kernel
(AWPM) has been used for efficiently retrieving videos by
recognizing human actions in realistic videos. This can be done
based on a multi channel bag of words which is constructed from
local spatial-temporal features of video clips. AWPM is the
extension of spatial-temporal pyramid matching (STPM) kernel
leverages in spatio-temporal granularity level and in multiple
feature descriptor types to build a suitable similarity metric
between two video clips. STPM uses predefined and fixed weights
and hence the proposed matching algorithm estimates adopts
channel of weights based on the Kernel target alignment of
training data. The following work is analysis over the content
based video retrieval in large database through various
mechanisms available in the literature.

I. INTRODUCTION
The convenient access to networked multimedia devices
and multimedia hosting services has contributed the huge
increase in network traffic and data storage. Many video
hosting companies such as Vimeo, YouTube and major IT
companies such as Apple, Google started to offer cloud in
audio and video storage services to customers. Recent attempt
and deployment of content based image search such as
content-based video search and automatic tagging based on
face recognition is still in under development.

The two main observations can be made to explain this
challenge:
Complexity of data can be added in the temporal
information on videos.
Querying tools are required since user doesnt have
sample videos at hand for query.
The first one uses complex query than typical text-based
ones. Thus more complex querying system such as dynamical
construction of hierarchical structure on videos needs more
elaborate queries which lead to error prone. However the
second one is not valid because mobile devices such as PDA
, cell-phones with camera and digital camera enables instance
of image and video recording which can be applied and used
for video query. With the advantage of mobile and increasing
multimedia, content-based video query systemtakes query of
sample video clip and searches the collection of videos which
is stored in multimedia portal service such as Google Video ,
Vimeo, Yahoo , You tube etc. This suggests similar video
clips from database with relevance measure. More methods
have been surveyed based on content based video retrieval
which is discussed in the following section..
.
II. METHODS USED FOR VIDEO RETRIEVAL

Various methods used for the challenges and characteristics of
content-based image and video query systems are discussed
below:
A. Automatic video annotation
A significant amount of research on automatic image
annotation has been done, and hence recently researchers are
more focusing on automatic video annotation.

1) Semi-supervised kernel density estimation:
Commonly two main difficulties occur in video related or
video retrieval process. They are large variation of video
content and requiring large computational cost. To address
this Semi supervised kernel density estimation has been used.
This method in [1] has been built on Kernel density
estimation approach. An improved method of semi-supervised
kernel density estimation can be used. In this a method a
kernel over observed samples are adapted by adopting border
kernels in the region of low density. In this way more accurate
density can be estimated. Experimental results on this method
improve the performance of Semi supervised kernel density
estimation over video annotation and it performs superior to
supervised methods.

2) Video Annotation using Enriched ontology:
Pictorially enriched ontologies for video annotation in [2]
which is based both on linguistic and visual concepts. The
implementation of this algorithm performs automatic
annotation of soccer video based on these extended
ontologies. Pictorially enriched ontologies are expressed using
the RDF (Resource Description Framework) standard. So this
can be shared and used in a search engine to provide video
summaries or to performcontent based retrieval from video
databases. Enriched ontology can be created by selecting a set
of sequences containing highlights which is described in the


linguistic ontology, then extracts visual features and performs
an unsupervised clustering. Experiments on pictorially
enriched ontologies have been shown to performautomatic
clips annotation up to the level of detail of pattern
specification.

B. Video streams similarity subsets
1) Statistical Summarization of Content Features: Detection
of near-duplicate videos in [3] based on a novel statistical
summarization of content features for each clip. This
method is very compact and effective by capturing the
dominating content and content changing trends of a
video. Unlike traditional frame-to-frame comparisons .the
similarity measure of this method is only linear in
independent of video length and dimensionality of feature
space. This shows that this method can accurately find
Near-duplicates from a large collection of tens of
thousands of video clips extremely fast.

2) Distance Measure: An edit distance discussed in [4]
based method for measuring the similarity between video
sequences. The vstring edit distance can be used and this
incorporates two basic components which account for
temporal constraints in the video and the visual similarity
between video scenes. Experimental analysis proved that
the vstring edit distance is able to rank different video
scenes according to their similarity. The vstring edit
distance provides a reliable measure of the distance
between video sequences.

3) Effective Indexing of video sequence: In [5] A video
summary model called Video Triplet (ViTri) has been
presented. ViTri represents a tightly bounded hypersphere
as a cluster defined by its, density, radius and position.
The similarity can be measured by its volume of
intersection between two hyperspheres. The total number
of similar frames is calculated and to obtain the overall
similarity between two video sequences. Thus the time
complexity of video similarity measure can be reduced
from two sequences

C. Video Retrieval And Recognition Using Matching
Algorithm

1) Pyramid matching: In [6] ,A pyramid match kernel a
new kernel function over unordered feature sets which
allows feature sets to be used effectively and efficiently in
kernel-based learning methods. Each feature set of video
is mapped to a multi-resolution histogram which
safeguards the individual features distinctness at the
supreme level. By computing a weighted intersection
over multi-resolution histograms, Pyramid match kernel
can be approximated and optimal partial matching of two
sequences of video can be computed.

2) Spatial Pyramid Matching: In [7] a method for
recognizing scene categories based on approximate global
geometric correspondence has been discussed. This
method process by partitioning the image into
increasingly fine sub-regions. And histograms of local
features can be computed found inside each sub-region.
The spatial pyramid is a simple and computationally
efficient extension of an order less bag-of-features image
representation, and it shows significantly performance on
challenging scene categorization tasks.

3) Spatio-temporal pyramid matching: In [8] spatio-
temporal pyramid matching (STPM) which is the
modified and extension of spatial pyramid matching
(SPM) method has been discussed , which considers
temporal information with spatial locations in order to
match objects in video shots. Spatial pyramid matching
has been applied to recognize images of natural scene.
Whereas Spatio-temporal pyramid matching partitions a
video shot into 3D grids in spatial and temporal space
.And then calculates the weighted sumof matches in each
level. A sequence of grid is constructed for each level.
Then the histogramfunction is calculated between two
images in each level. When compare with SPM, STPM
provides greater matching score.

4) Temporally aligned pyramid matching: Common
observation of video clip is that one video clip is usually
be combined form of several sub-clips, which correspond
to event development over multiple stages. In [9]
temporally aligned pyramid matching uses a method
Temporal-constrained Hierarchical Agglomerative
Clustering which is used to build a multi-level pyramid in
the temporal domain. According to the multi-level
pyramid structure, each video clip is divided into several
sub-clips, which depicts an development stage of the
event. For example also observe that in broadcast news
videos, an event stages of two different clips of the same
event may not follow a constant temporal order. To
address this scenario integration of information from
different sub-clips can be aligned explicitly. Finally, the
information from different levels of the pyramid can be
fused and results as Temporally Aligned Pyramid
Matching (TAPM).

III. CONCLUSION
Video retrieval in a large set of video clips is still a
challenging problemof interest in the vision communities. . A
lot of techniques have been discussed to retrieve videos
effectively in a large set of database using similarity search
method. In the above survey, methods of video annotation
such as Semi-supervised kernel density estimation, Enriched
ontology has been discussed. Then the similarity between the
subsets of videos can be described by Statistical
Summarization of Content Features, edit based Distance


Measure and an effective Video Triplet based indexing
method has been discussed above. Finally various matching
algorithms such as Pyramid matching, Spatial Pyramid
Matching, Spatio-temporal pyramid matching and temporally
aligned pyramid matching has been discussed. Each of the
above surveyed methods proves and shows better in some
categories and not in some other categories. Among this
method, the proposed method of Adaptive weighted pyramid
matching kernel (AWPM) outperforms in video retrieval
process by based on recognizing human action in a large set of
video clips.
.
.

REFERENCES
[1]. M. Wang, X.-S. Hua, Y. Song, X. Yuan, S. Li, H.-J . Zhang,
Automatic video annotation by semi-supervised learning with
kernel density estimation, in: Proceedings of the 14th Annual
ACM International Conference on Multimedia (MM 2006), ACM,
New York, NY, USA, 2006, pp. 967976.

[2]. M. Bertini, A.D. Bimbo, C. Torniai, Automatic video annotation
using ontologies extended with visual information, in: Proceedings
of the 13th Annual ACM International Conference on Multimedia
(MM 2005), ACM, New York, NY, USA, 2005, pp. 395398..

[3]. H.T. Shen, X. Zhou, Z. J uang, K. Shao, Statistical summarization
of content features for fast near-duplicate video detection, in:
Proceedings of the 15
th
ACM International Conference on
Multimedia (MM 2007), Augsburg, Bavaria, Germany, 2007, pp.
164 165.

[4]. D.A. Adjeroh, M. Lee, I. King, A distance measure for video
sequences, Vision and Image Understanding 85 (1/2) (1999) 25
45.

[5]. H.T. Shen, B.C. Ooi, X. Zhou, Towards effective indexing for
very large video sequence, in: Proceedings of the ACM SIGMOD
2005 International Conference on Management of Data,
Baltimore, Maryland, USA, 2005, pp. 730 741.

[6]. K. Grauman, T. Darrell, The pyramid match kernel: discriminative
classification with sets of image features, in: Proceedings of IEEE
International Conference on Computer Vision (ICCV 2005),
Beijing, China, 2005, pp. 1458 1465.

[7]. S. Lazebnik, C. Schmid, J . Ponce, Beyond bags of features: spatial
pyramid matching for recognizing natural scene categories, in:
Proceedings of the 2006 IEEE Computer Society Conference on
Computer Vision and Pattern Recognition (CVPR 2006), IEEE
Computer Society, Washington, DC, USA, 2006, pp. 2169 2178
[8]. Choi, W.J . Jeon, S.-C. Lee, Spatio-temporal pyramid matching for
sports videos, in: Proceedings of ACM International Conference
on Multimedia Information Retrieval (MIR 2008), Vancouver,
Canada, 2008, pp. 291 297.

[9]. D. Xu, S.-F. Chang, Visual event recognition in news video using
kernel methods with multi-level temporal alignment, in: 2007
IEEE Computer Society Conference on Computer Vision and
Pattern Recognition (CVPR 2007), 2007, pp. 18.

A Survey of An Adaptive Weighted Spatio-Temporal Pyramid Matching For Video Retrieval

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

A Survey of An Adaptive Weighted Spatio-Temporal Pyramid Matching For Video Retrieval

Загружено:

Авторское право:

Доступные форматы

International Journal of Computer Trends and Technology (IJCTT) volume 6 number 3 Dec 2013

ISSN: 2231-2803 http://www.ijcttjournal.org Page163

Вам также может понравиться