Вы находитесь на странице: 1из 5

The 8th International Workshop on Systems, Signal Processing and their Applications 2013: Special Sessions

RECOGNITION OF AGGRESSIVE HUMAN BEHAVIOR BASED ON


SURF AND SVM

A. Ouanane1, A. Serir2, and N. Djelal3


1,2,3
University of Science and Technology Houari Boumediene, Algiers, Algeria
1,2
Laboratory of Image Processing and Radiation
3
Laboratory of Robotics, Parallelism and Electroenergetics
1
aouanane@usthb.dz, 2aserir@usthb.dz, 3ndjelal@usthb.dz

the problems of the dynamic environment, the lighting


ABSTRACT change, the occlusion and so on.
The human behavior recognition techniques can be
In this paper, we aim to develop a novel decision used to overcome these limitations. Many of the
algorithm of human behavior using both Speeded Up existing methods are based on spatio-temporal
Robust Features (SURF) and PCA techniques. The approaches to characterize both the appearance and
SURF offers the opportunity to obtain a high level of motion features of human behaviors from video
performance under the constraint of scale variation sequence.
with low computing coast to form spatio-temporal
features. Thus, the PCA algorithm is used to reduce the SIFT [1], is an invariant descriptor that consists to
dimensionality of the provided features to form robust construct geometric invariant features to match
pattern. The latter is performed as an input for training between the objects [2][3]. Scovanner in [4], extended
the Support Vector Machine (SVM). This machine is the SIFT descriptor into 3D space which includes the
going to be able to classify the aggressive and non- temporal information. The results of the
aggressive behaviors. Different tests are conducted on aforementioned descriptor have represented a
KTH actions datasets. The obtained results have shown significant accuracy rate in action recognition.
that the proposed technique provides more significant However, the computation of the SIFT descriptor is
accuracy rate in comparison with current techniques very complex and it will take a long time to extract 2D
as well as it drives more robustness to a dynamic or 3D features.
environment. Speeded Up Robust Features descriptor [5] is a
competitive alternative to the SIFT one in terms of
robustness to invariant scaling, translation, and
Keywords: Aggressive behavior, SURF, PCA, SVM,
rotation. It allows us to provide a high level of spatio-
KTH
temporal features that are used as invariant features to
improve the human behavior recognition. In this work,
1. INTRODUCTION
an attempt to improve an intelligent security system
based on SURF framework that is implemented in our
The growth of the violent acts and aggressive
previous work for tracking of persons [6]. The interest
behaviors of persons in the street, airport and business
to use of the SURF algorithm is a twofold; local
centers requires a large number of cameras and
features and invariance. It is based on SIFT descriptor.
increases the power of surveillance. This challenge
The underlying idea of this algorithm is fairly simple
puts the operator in a position where is no longer easy
technique that is based on local gradient histograms
for him to monitor the video streams and so he
sampled in a square grid around the interest points.
provides an adequate and efficient decision in terms of
Invariance is achieved through changing the scale and
precision and speed.
rotation by operating in a local reference frame relative
To overcome these constraints, it is necessary to to a dominant scale and rotation computed from the
perform an intelligent security system aiming to predict image. In this case, we intend to use robustness of this
and recognize the aggressive acts taking into account technique to extract the spatio-temporal features to

978-1-4673-5540-7/13/$31.00 ©2013 IEEE 396


modeling the variation of aggressive and non- algorithm in order to form an appropriate pattern. The
aggressive acts or behaviors. The obtained spatio- last stage consists of training support vector machine
temporal features may require a huge dimensionality according to the obtained patterns.
which is reduced by using PCA algorithm so as to form
a robust pattern. A binary support vector machine is 2.1 SURF Descriptor
trained according to the patterns to discriminate the
different behaviors into two classes; aggressive and In this section, the SURF algorithm is introduced ¿rst
non-aggressive. and then we will discuss how the spatio-temporal
The following sections of this paper are organized features are generated using this descriptor. The latter
as follows. First, we present the different process of the aims to extract significant spatial features which are
proposed algorithm including a description of SURF characterized the different human behaviors executed
descriptor. Then, we illustrate experimental results by a single person. First, we use popular feature
with performance evaluation in Section 3. The detectors to find interest points within an image and
conclusion is given in Section 4. then perform their scale and orientation to have
invariant features using the proposed descriptor.
2. PROPOSED ALGORITHM
SURF Algorithm [7]

Input : N video frames f1,…,fn


Read Images Target position e1 of an object in first frame

Output: Estimated position of e1, … , en


Insert Point Detection
Initialization (for frame f1) :
[1] Create a list ‘obj_lst’ of the features
Features Description extracted in the area of e1 with initial
Based On SURF mixture parameter mini.
For each new frame fi do :
[2] Extract the features in around of the area
Features Matching
of ei-1, and preserve it in new list ‘est_lst’.
[3] Do feature corresponding between lists
Reduce the dimensionality ‘obj_lst’ and ‘est_lst’ to detect the
By PCA features’ motions {vf,i}.
[4] Evaluate object motion ci with {vf,i} by
Equation.
SVM [5] Target new position ci ← (ci , ei −1 ) .
[6] Update each feature in list ‘obj_lst’

Aggressive Non-aggressive Where: c = (u x , u y ,ϕ , ρ ) : is the model of target


Behaviors Behaviors motion, u x , u y represent the spatial translation
ϕ and ρ are the rotation and scale changes respectively
Figure 1. Flowchart of the proposed Algorithm [7]:

2.2 Human Behavior Pattern Creation


Herein, we demonstrate the main techniques that are
used to human behavior recognition. The figure 1 Broadly, the actions and behaviors of one person are
shows the flowchart of the proposed algorithm. In the characterized by similar spatial information unlike to
first stage, a feature detector is applied to extract the motion information which is changed by one
interest points. The latter are then performed to obtain person to another. These actions remain for a short
invariant features to scaling, orientation and translation period of time and they are executed in repetitive
by using SURF algorithm. The dimensionality of the manner. In this work, the short period of such action is
provided features has been reduced by using PCA

397
fixed in 50 frames in the KTH actions dataset. In order the dimension of Sm including Si has been reduced
to have the spatio-temporal features, two factors are efficiently.
considered; the pertinence and the dimensionality of
feature vectors. These factors allow us to better
characterize of human behaviors in terms of the
accuracy rate and computational cost in classification
stage.
Let us consider a given action from the KTH
dataset which is composed by N frames denoted fi
which is described as follows:
fi = { f1 ,f2 ,f3 ,…, fN , / 1 ” i ” N } . (1)
Their corresponding spatio-temporal features
through the SURF descriptor are denoted Sm which
defines by a set of Si as described in the following
formula:
Sm = { S1 ,S2 ,S3 ,…, SN, / 1 ” i ” N }. (2)
We recall that N has been chosen to N=50 frames
which represents the short period of the action. Thus, m
is the number of the training example which is used in
the classification stage. The figure 2 shows the
extracted features of the boxing action using SURF
descriptor.
On the other hand, the dimensionality of Sm is very
high. Dealing with such high dimensional data will
cause low accuracy and high computing complexity.
Therefore, the SURF vectors give a high-level of
spatial information which is very requirement to have a
significant accuracy in the classification stage. In such
case, it is useful to reduce the dimensionality of the
provided vectors without losing their pertinent
information.
Among of existing methods that are used to reduce
the dimensionality, we have opted to use the Principal
Component Analysis algorithm (PCA) [8][9]. The
basic concept of this algorithm is conceptually quite
simple. First, the N-dimensional mean vector μ and d
×d covariance matrix Ȉ are computed for the full data
denoted Sm={ S1 ,S2 ,S3 , ··· ,Si , ···|1 ” i ” N }. Then, the
eigenvectors and eigenvalues are computed and sorted
according to decreasing eigenvalue. Call these
eigenvector v1 with eigenvalue Ȝ1 , v2 with eigenvalue
Ȝ2 , and so on. Next, the largest k such eigenvectors are
chosen. In this work, this is done by looking at a
spectrum of eigenvectors. Form a kxk matrix S whose
columns consist of the k eigenvectors. Figure 2. SURF features pattern for boxing action
Preprocess data according to:
X ′ = St(X − μ) (3)
2.3 Classification
Equation above is applied to obtain the more
significant features of Si . After using PCA algorithm, Many types of learning algorithms can be used as a
binary classifier in order to build the training model

398
classi¿er. In this work, we are interested to use the subsets on which 6 subjects from KTH dataset have
support vector machine classifier [10] to discriminate been chosen for training set and 3for the validation set.
the human behavior into two classes; Aggressive and The generated feature vectors X’ have been fed into the
Non-aggressive. This interest is motivated by binary support vector machine classifier model that
numerous factors. First, it is effective in high was built in the training set to classify the aggressive
dimensional spaces and uses a subset of training points human behavior. The SVM classifier is then performed
in the decision function called support vectors which by using different kernels including Quadratic,
allow obtaining a classifier probably faster than other Polynomial, and Radial Basis Function kernels in order
methods. to optimize the accuracy rate.
Moreover, the SVM aims to ¿nd a decision plane As we can show in the table 1, the accuracy rate
that has a maximum distance from the nearest training performed by different kernels of SVM is very
pattern of human behaviors. Given the training data significant. Moreover, the average recognition rate is
{(xi,yi)|yi =1or í1,i =1,..., N}for a two-class more significant in RBF kernel comparatively to the
classi¿cation; aggressive and no-aggressive behaviors. other kernels with accuracy equal to 96.8 %.
Where xi is the input feature, yi is the class label and N
is the number of training sample. Table 1: Average Accuracy Rate with KTH dataset.
Average accuracy rate
3. EXPERIMENTAL RESULTS
SVM-Linear 92 %
In order to validate the performance of the proposed SVM-Quadratic 95%
algorithm, several tests are conducted on popular SVM- Polynomial 95%
dataset called KTH actions [11]. It consists of 25 SVM- RBF 96.8%
persons performing 6 different actions {boxing, hand-
clapping, jogging, running, walking, hand-waving}. Due to the rarity of works dealing with aggressive
Moreover, the KTH dataset includes both normal and behavior based on the binary classification, it is useful
abnormal behaviors which could be aggressive or non to compare our method with that proposed by Datong
aggressive based on the dynamic of action and its et al in [12]. We can prove that our method gives
appearance such as boxing action. In such case, two significant performance in terms of the accuracy rate
classes have been performed from the aforementioned and the dimensionality of feature vectors unlike to the
dataset as follows: the class of Aggressive behavior proposed method of Datong et al. The latter
consists of the boxing, hand-clapping and hand-waving characterizes the aggressiveness of actions on KTH
of the KTH dataset and the second class contains dataset by using a local binary motion descriptor and
walking, jogging and running actions. The second class they used a one-center SVM to detect the
is obviously considered as non-aggressive behaviors. aggressiveness for each action. Nevertheless, the
The Figure 3 shows a sample of the KTH dataset actions and activities of persons are often characterized
actions. by two aspects; appearance and motion to have spatio-
temporal features which are not involved in the work
of Datong.
In order to further verify the performance of the
proposed method with competitive methods, it is
necessary to adapt the proposed algorithm to be able to
discriminate the different actions on KTH dataset. In
this case, we have used a one-versus-all approach [13]
to build multi-class SVM training model wherein each
input vector X’ has been fed with its related class.
The Table 2 provides the accuracy rate of the
proposed method in comparison with current
techniques which are carried out in human behavior
Figure 3. KTH DATASET recognition. We can see that the proposed method
provides satisfactory performance and proves its
The different experiments are conducted subject to effectiveness. We can also see that the accuracy rate of
the cross-validation technique. The latter involves the proposed method outperforms the majority of the
partitioning a sample of data into complementary

399
reported works which is also slightly better than the [6] N. Djelal, N. Saadia and A. Ouanane, “People Tracking
method reported in [14]. Using SURF Algorithm”, ISPA'12, Mostaganem, Algeria.
2/4 December 2012.
Table 2: Comparison with the stat-of-the-art (KTH). [7] W. He, T. Yamashita, H. Lu and S. Lao, “SURF
Tracking”, IEEE 12th International Conference on Computer
Methods Accuracy Rate Vision (ICCV), pp. 1586-1592, 2009.
Proposed method 96%
Wang et al. [14] 94.2% [8] L.Sirovich, and M. Kirby. “Low dimensional procedure
for the characterization of human faces”. Journal of the
Schindler et al [15] 92.7% Optical Society of America. A, Optics, Image Science, and
Laptev et al [16] 91.8% Vision, 4(3), 519–524. 1987.

4. CONCLUSION [9] I.T Jolliơe. “Principal component analysis”. Springer,


New York. 2002.
In this paper, we have proposed a novel decision [10] V.N. Vapnik. “The Nature of Statistical Learning
algorithm that aims to recognize aggressive behaviors. Theory”. New York: Springer-Verlag, 1995.
The proposed algorithm is mainly based on SURF [11] http://www.nada.kth.se/cvap/actions/.
descriptor. The latter has been used in order to have the
invariant spatio-temporal features of the human [12] C. Datong, H. Wactlar, C. Ming-yu, G. Can, A.
Bharucha, A. Hauptmann,. “Recognition of aggressive
behavior.
human behavior using binary local motion descriptors”.
Based on the experimental results, we can sum up Engineering in Medicine and Biology Society, EMBS. 30th
the performance of the proposed method on the Annual International Conference of the IEEE, pp.5238,5241,
following: it is able to obtain features invariant to 20-25 Aug. 2008
scaling, translation and rotation which allows us to [13] C. W Hsu and C.J Lin. “A comparison of methods for
perform the human behavior analysis. By using the multi-class support vector machines,” IEEE Transactions on
PCA algorithm, the dimensionality of vector is reduced Neural Networks, Vol. 13, pp. 415–425.2002.
and the computational cost requirement is then lower. [14] H. Wang, A. Klaser, C. Schmid and C.-L Liu. “Action
In addition, it also provides more robustness under recognition by dense trajectories”. Proc. Int. Conf. Computer
background noise and lighting changes. Vision and Pattern Recognition, Colerado Springs, CO, USA,
pp. 3169 – 3176. 2011.
For further research, we suggest the use of motion
descriptors to analyze the dynamic of the aggressive [15] K. Schindler, L.v. Gool. “Action snippets: how many
behavior in terms of the periodicity, velocity and frames does human action recognition require”. In: CVPR,
energy. pp. 1–8, 2008.
[16] I. Laptev, M. Marszalek, C. Schmid, B. Rozen¿eld.
REFERENCES “Learning realistic human actions from movies”. IEEE
Computer Vision and Pattern Recognition.2008.
[1] D. Lowe. “Distinctive image features from scale-invar
iant keypoints”. International J. on Computer Vision 60(2)
:91–110.2004.
[2] T. Caetano, J. McAuley, L. Cheng, Q. Le, A. Smola.
“Learning graph matching”. IEEE Trans. Patter n Analysis
and Machine Intelligence 31:1048–1058. 2009.
[3] J. Morel, G. Yu. “ASIFT: a new framework for fully
affine invariant image comparison”. SIAM J. on Imaging
Sciences 2:438–469. 2009
[4] P. Scovanner, S. Ali, and M. Shah, “A 3-dimensional
sift descriptor and its application to action recognition”.
Proceedings of the ACM. International Multimedia
Conference and Exhibition, pp 357-360, 2007.
[5] H. Bay, T. Tuytelaars, L. Van Gool1, E. Zurich “SURF:
Speeded Up Robust Features”, ECCV 2006, pp. 404–417,
2006.

400

Вам также может понравиться